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SERIES PREFACE 



n the Essentials of Behavioral Science series, our goal is to provide readers 
with books that will deliver key practical information in an efficient, ac- 
cessible style. The series features books on a variety of topics, such as 
statistics, psychological testing, and research design and methodology, to 
name just a few. For the experienced professional, books in the series offer 
a concise yet thorough review of a specific area of expertise, including nu- 
merous tips for best practices. Students can turn to series books for a clear 
and concise overview of the important topics in which they must become 
proficient to practice skillfully, efficiently, and ethically in their chosen 
fields. 

Wherever feasible, visual cues highlighting key points are utilized 
alongside systematic, step-by-step guidelines. Chapters are focused and 
succinct. Topics are organized for an easy understanding of the essential 
material related to a particular topic. Theory and research are continually 
woven into the fabric of each book, but always to enhance the practical 
application of the material, rather than to sidetrack or overwhelm readers. 
With this series, we aim to challenge and assist readers in the behavioral 
sciences to aspire to the highest level of competency by arming them with 
the tools they need for knowledgeable, informed practice. 

The purposes of Essentials of Research Design and Methodology are to dis- 
cuss the various types of research designs that are commonly used, the ba- 
sic process by which research studies are conducted, the research-related 
considerations of which researchers should be aware, the manner in which 
the results of research can be interpreted and disseminated, and the typi- 
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x SERIES PREFACE 



cal pitfalls faced by researchers when designing and conducting a research 
study. This book is ideal for those readers with minimal knowledge of re- 
search as well as for those readers with intermediate knowledge who need 
a quick refresher regarding particular aspects of research design and 
methodology. For those readers with an advanced knowledge of research 
design and methodology, this book can be used as a concise summary of 
basic research techniques and principles, or as an adjunct to a more ad- 
vanced research methodology and design textbook. Finally, even for those 
readers who do not conduct research, this book will become a valuable 
addition to your bookcase because it will assist you in becoming a more 
educated consumer of research. Being able to evaluate the appropriate- 
ness of a research design or the conclusions drawn from a particular re- 
search study will become increasingly more important as research be- 
comes more accessible to nonscientists. In that regard, this book will 
improve your ability to efficiendy and effectively digest and understand 
the results of a research study. 

Alan S. Kaufman, PhD, andNadeen L. Kaufman, EdD, Founding Editors 
Yale University School of Medicine 
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One 
INTRODUCTION AND OVERVIEW 



Progress in almost every field of science depends on the contribu- 
tions made by systematic research; thus research is often viewed as 
the cornerstone of scientific progress. Broadly defined, the purpose 
of research is to answer questions and acquire new knowledge. Research 
is the primary tool used in virtually all areas of science to expand the fron- 
tiers of knowledge. For example, research is used in such diverse scientific 
fields as psychology, biology, medicine, physics, and botany, to name just 
a few of the areas in which research makes valuable contributions to what 
we know and how we think about things. Among other things, by con- 
ducting research, researchers attempt to reduce the complexity of prob- 
lems, discover the relationship between seemingly unrelated events, and 
ultimately improve the way we live. 

Although research studies are conducted in many diverse fields of sci- 
ence, the general goals and defining characteristics of research are typically 
the same across disciplines. For example, across all types of science, re- 
search is frequently used for describing a thing or event, discovering the 
relationship between phenomena, or making predictions about future 
events. In short, research can be used for the purposes of description, ex- 
planation, and prediction, all of which make important and valuable con- 
tributions to the expansion of what we know and how we live our lives. In 
addition to sharing similar broad goals, scientific research in virtually all 
fields of study shares certain defining characteristics, including testing 
hypotheses, careful observation and measurement, systematic evaluation 
of data, and drawing valid conclusions. 
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In recent years, the results of various research studies have taken center 
stage in the popular media. No longer is research the private domain of re- 
search professors and scientists wearing white lab coats. To the contrary, 
the results of research studies are frequently reported on the local evening 
news, CNN, the Internet, and various other media outlets that are acces- 
sible to both scientists and nonscientists alike. For example, in recent 
years, we have all become familiar with research regarding the effects of 
stress on our psychological well-being, the health benefits of a low- 
cholesterol diet, the effects of exercise in preventing certain forms of can- 
cer, which automobiles are safest to drive, and the deleterious effects of 
pollution on global warming. We may have even become familiar with re- 
search studies regarding the human genome, the Mars Land Rover, the use 
of stem cells, and genetic cloning. Not too long ago, it was unlikely that the 
results of such highly scientific research studies would have been shared 
with the general public to such a great extent. 

Despite the accessibility and prevalence of research in today's society, 
many people share common misperceptions about exactly what research 
is, how research can be used, what research can tell us, and the limitations 
of research. For some people, the term "research" conjures up images of 
scientists in laboratories watching rats run through mazes or mixing 
chemicals in test tubes. For other people, the term "research" is associated 
with telemarketer surveys, or people approaching them at the local shop- 
ping mall to "just ask you a few questions about your shopping habits." In 
actuality, these stereotypical examples of research are only a small part of 
what research comprises. It is therefore not surprising that many people 
are unfamiliar with the various types of research designs, the basics of how 
research is conducted, what research can be used for, and the limits of us- 
ing research to answer questions and acquire new knowledge. Rapid Ref- 
erence 1.1 discusses what we mean by "research" from a scientific per- 
spective. 

Before addressing these important issues, however, we should first 
briefly review what science is and how it goes about telling us what we 
know. 
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What Exactly is Research? 

Research studies come in many different forms, and we will discuss sev- 
eral of these forms in more detail in Chapter 5. For now, however we will 
focus on two of the most common types of research — correlational re- 
search and experimental research. 

Correlational research: In correlational research, the goal is to deter- 
mine whether two or more variables are related. (By the way, "variables" is 
a term with which you should be familiar A variable is anything that can 
take on different values, such as weight, time, and height.) For example, a 
researcher may be interested in determining whether age is related to 
weight. In this example, a researcher may discover that age is indeed re- 
lated to weight because as age increases, weight also increases. If a corre- 
lation between two variables is strong enough, knowing about one vari- 
able allows a researcherto make a prediction about the other variable. 
There are several different types of correlations, which will be discussed in 
more detail in Chapter 5. It is important to point out, however that a cor- 
relation — or relationship — between two things does not necessarily 
mean that one thing caused the otherTo draw a cause-and-effect conclu- 
sion, researchers must use experimental research. This point will be em- 
phasized throughout this book. 

Experimental research: In its simplest form, experimental research in- 
volves comparing two groups on one outcome measure to test some hy- 
pothesis regarding causation. For example, if a researcher is interested in 
the effects of a new medication on headaches, the researcher would ran- 
domly divide a group of people with headaches into two groups. One of 
the groups, the experimental group, would receive the new medication be- 
ing tested. The other group, the control group, would receive a placebo 
medication (i.e., a medication containing a harmless substance, such as 
sugar, that has no physiological effects). Besides receiving the different 
medications, the groups would be treated exactly the same so that the re- 
search could isolate the effects of the medications. After receiving the 
medications, both groups would be compared to see whether people in 
the experimental group had fewer headaches than people in the control 
group. Assuming this study was properly designed (and properly designed 
studies will be discussed in detail in later chapters), if people in the experi- 
mental group had fewer headaches than people in the control group, the 
researcher could conclude that the new medication reduces headaches. 
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OVERVIEW OF SCIENCE AND THE SCIENTIFIC METHOD 

In simple terms, science can be denned as a methodological and systematic 
approach to the acquisition of new knowledge. This definition of science 
highlights some of the key differences between how scientists and non- 
scientists go about acquiring new knowledge. Specifically, rather than 
relying on mere casual observations and an informal approach to learn 
about the world, scientists attempt to gain new knowledge by making care- 
ful observations and using systematic, controlled, and methodical ap- 
proaches (Shaughnessy & Zechmeister, 1997). By doing so, scientists are 
able to draw valid and reliable conclusions about what they are studying. 
In addition, scientific knowledge is not based on the opinions, feelings, or 
intuition of the scientist. Instead, scientific knowledge is based on objec- 
tive data that were reliably obtained in the context of a carefully designed 
research study. In short, scientific knowledge is based on the accumulation 
of empirical evidence (Kazdin, 2003a), which will be the topic of a great 
deal of discussion in later chapters of this book. 

The defining characteristic of scientific research is the scientific 
method (summarized in Rapid Reference 1.2). First described by the En- 
glish philosopher and scientist Roger Bacon in the 13th century, it is still 
generally agreed that the scientific method is the basis for all scientific in- 
vestigation. The scientific method is best thought of as an approach to the 
acquisition of new knowledge, and this approach effectively distinguishes 
science from nonscience. To be clear, the scientific method is not actually 
a single method, as the name would erroneously lead one to believe, but 
rather an overarching perspective on how scientific investigations should 
proceed. It is a set of research principles and methods that helps re- 
searchers obtain valid results from their research studies. Because the sci- 
entific method deals with the general approach to research rather than the 
content of specific research studies, it is used by researchers in all different 
scientific disciplines. As will be seen in the following sections, the biggest 
benefit of the scientific method is that it provides a set of clear and agreed- 
upon guidelines for gathering, evaluating, and reporting information in 
the context of a research study (Cozby, 1993). 
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The Scientific Method 

The development of the scientific method is usually credited to Roger 
Bacon, a philosopher and scientist from I 3th-century England, although 
some argue that the Italian scientist Galileo Galilei played an important 
role in formulating the scientific method. Later contributions to the scien- 
tific method were made by the philosophers Francis Bacon and Rene 
Descartes. Although some disagreement exists regarding the exact char- 
acteristics of the scientific method, most agree that it is characterized by 
the following elements: 

• Empirical approach 

• Observations 

• Questions 

• Hypotheses 

• Experiments 

• Analyses 

• Conclusions 

• Replication 

There has been some disagreement among researchers over the years 
regarding the elements that compose the scientific method. In fact, some 
researchers have even argued that it is impossible to define a universal ap- 
proach to scientific investigation. Nevertheless, for over 100 years, the 
scientific method has been the defining feature of scientific research. Re- 
searchers generally agree that the scientific method is composed of the 
following key elements (which will be the focus of the remainder of this 
chapter): an empirical approach, observations, questions, hypotheses, ex- 
periments, analyses, conclusions, and replication. 

Before proceeding any further, one word of caution is necessary. In the 
brief discussion of the scientific method that follows, we will be introduc- 
ing several new terms and concepts that are related to research design and 
methodology. Do not be intimidated if you are unfamiliar with some of the 
content contained in this discussion. The purpose of the followingis simply 
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to set the stage for the chapters that follow, and we will be elaborating on 
each of the terms and concepts throughout the remainder of the book. 

Empirical Approach 

The scientific method is firmly based on the empirical approach. The em- 
pirical approach is an evidence-based approach that relies on direct obser- 
vation and experimentation in the acquisition of new knowledge (see 
Kazdin, 2003a). In the empirical approach, scientific decisions are made 
based on the data derived from direct observation and experimentation. 
Contrast this approach to decision making with the way that most nonsci- 
entific decisions are made in our daily lives. For example, we have all made 
decisions based on feelings, hunches, or "gut" instinct. Additionally, we 
may often reach conclusions or make decisions that are not necessarily 
based on data, but rather on opinions, speculation, and a hope for the best. 
The empirical approach, with its emphasis on direct, systematic, and care- 
ful observation, is best thought of as the guiding principle behind all re- 
search conducted in accordance with the scientific method. 

Observations 

An important component in any scientific investigation is observation. In 
this sense, observation refers to two distinct concepts — being aware of the 
world around us and making careful measurements. Observations of the 
world around us often give rise to the questions that are addressed through 
scientific research. For example, the Newtonian observation that apples 
fall from trees stimulated much research into the effects of gravity. There- 
fore, a keen eye to your surroundings can often provide you with many 
ideas for research studies. We will discuss the generation of research ideas 
in more detail in Chapter 2. 

In the context of science, observation means more than just observing 
the world around us to get ideas for research. Observation also refers to the 
process of making careful and accurate measurements, which is a distin- 
guishing feature of well-conducted scientific investigations. When making 
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measurements in the context of research, scientists typically take great 
precautions to avoid making biased observations. For example, if a re- 
searcher is observing the amount of time that passes between two events, 
such as the length of time that elapses between lightning and thunder, it 
would certainly be advisable for the researcher to use a measurement de- 
vice that has a high degree of accuracy and reliability. Rather than simply 
trying to "guesstimate" the amount of time that elapsed between those 
two events, the researcher would be advised to use a stopwatch or similar 
measurement device. By doing so, the researcher ensures that the mea- 
surement is accurate and not biased by extraneous factors. Most people 
would likely agree that the observations that we make in our daily lives are 
rarely made so carefully or systematically. 

An important aspect of measurement is an operational definition. Re- 
searchers define key concepts and terms in the context of their research 
studies by using operational definitions. By using operational definitions, 
researchers ensure that everyone is talking about the same phenomenon. 
For example, if a researcher wants to study the effects of exercise on stress 
levels, it would be necessary for the researcher to define what "exercise" 
is. Does exercise refer to jogging, weight lifting, swimming, jumping rope, 
or all of the above? By defining "exercise" for the purposes of the study, 
the researcher makes sure that everyone is referring to the same thing. 
Clearly, the definition of "exercise" can differ from one study to another, 
so it is crucial that the researcher define "exercise" in a precise manner in 
the context of his or her study. Having a clear definition of terms also 
ensures that the researcher's study can be replicated by other researchers. 
The importance of operational definitions will be discussed further in 
Chapter 2. 



Questions 

After getting a research idea, perhaps from making observations of the 
world around us, the next step in the research process involves translating 
that research idea into an answerable question. The term "answerable" is 
particularly important in this respect, and it should not be overlooked. It 
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would obviously be a frustrating and ultimately unrewarding endeavor to 
attempt to answer an unanswerable research question through scientific 
investigation. An example of an unanswerable research question is the fol- 
lowing: "Is there an exact replica of me in another universe?" Although 
this is certainly an intriguing question that would likely yield important in- 
formation, the current state of science cannot provide an answer to that 
question. It is therefore important to formulate a research question that 
can be answered through available scientific methods and procedures. 
One might ask, for example, whether exercising (i.e., perhaps opera- 
tionally defined as running three times per week for 30 minutes each time) 
reduces cholesterol levels. This question could be researched and an- 
swered using established scientific methods. 

Hypotheses 

The next step in the scientific method is coming up with a hypothesis, which 
is simply an educated — and testable — guess about the answer to your 
research question. A hypothesis is often described as an attempt by the re- 
searcher to explain the phenomenon of interest. Hypotheses can take var- 
ious forms, depending on the question being asked and the type of study 
being conducted (see Rapid Reference 1.3). 

A key feature of all hypotheses is that each must make a prediction. Re- 
member that hypotheses are the researcher's attempt to explain the phe- 
nomenon being studied, and that explanation should involve a prediction 
about the variables being studied. These predictions are then tested by 
gathering and analyzing data, and the hypotheses can either be supported 
or refuted (falsified; see Rapid Reference 1.4) on the basis of the data. 

In their simplest forms, hypotheses are typically phrased as "if-then" 
statements. For example, a researcher may hypothesize that "^people 
exercise for 30 minutes per day at least three days per week, then their cho- 
lesterol levels will be reduced." This hypothesis makes a prediction about 
the effects of exercising on levels of cholesterol, and the prediction can be 
tested by gathering and analyzing data. 

Two types of hypotheses with which you should be familiar are the null 
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Relationship Between Hypotheses and Research Design 

Hypotheses can take many different forms depending on the type of re- 
search design being used. Some hypotheses may simply describe how two 
things may be related. For example, in correlational research (which will 
be discussed in Chapter 5), a researcher might hypothesize that alcohol 
intoxication is related to poor decision making. In other words, the re- 
searcher is hypothesizing that there is a relationship between using alco- 
hol and decision making ability (but not necessarily a causal relationship). 
However; in a study using a randomized controlled design (which will also 
be discussed in Chapter 5), the researcher might hypothesize that using 
alcohol causes poor decision making.Therefore, as may be evident, the 
hypothesis being tested by a researcher is largely dependent on the type 
of research design being used. The relationship between hypotheses and 
research design will be discussed in more detail in later chapters. 



Rap/a 'Reference // 



Falsifiability of Hypotheses 

According to the 20th-century philosopher Karl Popper hypotheses must 
be falsifiable (Popper 1 963). In other words, the researcher must be able 
to demonstrate that the hypothesis is wrong. If a hypothesis is not falsifi- 
able, then science cannot be used to test the hypothesis. For example, hy- 
potheses based on religious beliefs are not falsifiable. Therefore, because 
we can never prove that faith-based hypotheses are wrong, there would 
be no point in conducting research to test them. Another way of saying 
this is that the researcher must be able to reject the proposed explana- 
tion (i.e., hypothesis) of the phenomenon being studied. 



hypothesis and the alternate (or experimental) hypothesis. The null hypoth- 
esis always predicts that there will be no differences between the groups be- 
ing studied. By contrast, the alternate hypothesis predicts that there will be a 
difference between the groups. In our example, the null hypothesis would 
predict that the exercise group and the no-exercise group will not differ 
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significantly on levels of cholesterol. The alternate hypothesis would pre- 
dict that the two groups will differ significantly on cholesterol levels. Hy- 
potheses will be discussed in more detail in Chapter 2. 

Experiments 

After articulating the hypothesis, the next step involves actually conduct- 
ing the experiment (or research study). For example, if the study involves 
investigating the effects of exercise on levels of cholesterol, the researcher 
would design and conduct a study that would attempt to address that ques- 
tion. As previously mentioned, a key aspect of conducting a research study 
is measuring the phenomenon of interest in an accurate and reliable manner 
(see Rapid Reference 1.5). In this example, the researcher would collect 
data on the cholesterol levels of the study participants by using an accurate 
and reliable measurement device. Then, the researcher would compare the 
cholesterol levels of the two groups to see if exercise had any effects. 
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Accuracy vs. Reliability 

When talking about measurement in the context of research, there is an 
important distinction between being accurate and being reliable. Accuracy 
refers to whether the measurement is correct, whereas reliability refers to 
whether the measurement is consistent. An example may help to clarify 
the distinction. When throwing darts at a dart board, "accuracy" refers to 
whether the darts are hitting the bull's eye (an accurate dart thrower will 
throw darts that hit the bull's eye). "Reliability," on the other hand, refers 
to whetherthe darts are hitting the same spot (a reliable dart thrower will 
throw darts that hit the same spot). Therefore, an accurate and reliable 
dart thrower will consistently throw the darts in the bull's eye. As may be 
evident, however it is possible for the dartthrowerto be reliable, but not 
accurate. For example, the dart thrower may throw all of the darts in the 
same spot (which demonstrates high reliability), but that spot may not be 
the bull's eye (which demonstrates low accuracy). In the context of mea- 
surement, both accuracy and reliability are equally important. 
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Analyses 

After conducting the study and gathering the data, the next step involves 
analyzing the data, which generally calls for the use of statistical tech- 
niques. The type of statistical techniques used by a researcher depends on 
the design of the study, the type of data being gathered, and the questions 
being asked. Although a detailed discussion of statistics is beyond the 
scope of this text, it is important to be aware of the role of statistics in con- 
ducting a research study. In short, statistics help researchers minimize the 
likelihood of reaching an erroneous conclusion about the relationship be- 
tween the variables being studied. 

A key decision that researchers must make with the assistance of statis- 
tics is whether the null hypothesis should be rejected. Remember that the 
null hypothesis always predicts that there will be no difference between the 
groups. Therefore, rejecting the null hypothesis means that there is a. dif- 
ference between the groups. In general, most researchers seek to reject the 
null hypothesis because rejection means the phenomenon being studied 
(e.g., exercise, medication) had some effect. 

It is important to note that there are only two choices with respect to 
the null hypothesis. Specifically, the null hypothesis can be either rejected 
or not rejected, but it can never be accepted. If we reject the null hypoth- 
esis, we are concluding that there is a significant difference between the 
groups. If, however, we do not reject the null hypothesis, then we are con- 
cluding that we were unable to detect a difference between the groups. To 
be clear, it does not mean that there is no difference between the two 
groups. There may in actuality have been a significant difference between 
the two groups, but we were unable to detect that difference in our study. 
We will talk more about this important distinction in later chapters. 

The decision of whether to reject the null hypothesis is based on the 
results of statistical analyses, and there are two types of errors that re- 
searchers must be careful to avoid when making this decision — Type I er- 
rors and Type II errors. A Type I error occurs when a researcher concludes 
that there is a difference between the groups being studied when, in fact, 
there is no difference. This is sometimes referred to as a "false positive." 
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By contrast, a Type II error occurs when the researcher concludes that there 
is not a difference between the two groups being studied when, in fact, 
there is a difference. This is sometimes referred to as a "false negative." As 
previously noted, the conclusion regarding whether there is a difference 
between the groups is based on the results of statistical analyses. Specifi- 
cally, with a Type I error, although there is a statistically significant result, 
it occurred by chance (or error) and there is not actually a difference be- 
tween the two groups (Wampold, Davis, & Good, 2003). With a Type II 
error, there is a nonsignificant statistical result when, in fact, there actually 
is a difference between the two groups (Wampold et al.). 

The typical convention in most fields of science allows for a 5% chance 
of erroneously rejecting the null hypothesis (i.e., of making a Type I error). 
In other words, a researcher will conclude that there is a significant differ- 
ence between the groups being studied (i.e., will reject the null hypothesis) 
only if the chance of being incorrect is less than 5%. For obvious reasons, 
researchers want to reduce the likelihood of concluding that there is a sig- 
nificant difference between the groups being studied when, in fact, there 
is not a difference. 

The distinction between Type I and Type II errors is very important, 
although somewhat complicated. An example may help to clarify these 
terms. In our example, a researcher conducts a study to determine whether 
a new medication is effective in treating depression. The new medication 
is given to Group 1, while a placebo medication is given to Group 2. If, at 
the conclusion of the study, the researcher concludes that there is a signif- 
icant difference in levels of depression between Groups 1 and 2 when, in 
fact, there is no difference, the researcher has made a Type I error. In sim- 
pler terms, the researcher has detected a difference between the groups 
that in actuality does not exist; the difference between the groups occurred 
by chance (or error). By contrast, if the researcher concludes that there is 
no significant difference in levels of depression between Groups 1 and 2 
when, in fact, there is a difference, the researcher has made a Type II er- 
ror. In simpler terms, the researcher has failed to detect a difference that 
actually exists between the groups. 

Which type of error is more serious — Type I or Type II? The answer to 
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this question often depends on the context in which the errors are made. 
Let's use the medical context as an example. If a doctor diagnoses a patient 
with cancer when, in fact, the patient does not have cancer (i.e., a false pos- 
itive), the doctor has committed a Type I error. In this situation, it is likely 
that the erroneous diagnosis will be discovered (perhaps through a second 
opinion) and the patient will undoubtedly be relieved. If, however, the 
doctor gives the patient a clean bill of health when, in fact, the patient ac- 
tually has cancer (i.e., a false negative), the doctor has committed a Type II 
error. Most people would likely agree that a Type II error would be more 
serious in this example because it would prevent the patient from getting 
necessary medical treatment. 

You may be wondering why researchers do not simply set up their re- 
search studies so that there is even less chance of making a Type I error. 
For example, wouldn't it make sense for researchers to set up their re- 
search studies so that the chance of making a Type I error is less than 1% 
or, better yet, 0%? The reason that researchers do not set up their studies 
in this manner has to do with the relationship between making Type I er- 
rors and making Type II errors. Specifically, there is an inverse relationship 



C A CJ T I 1\ 



Type I Errors vs. Type II Errors 

Type I Error (false positive): Concluding there is a difference be- 
tween the groups being studied when, in fact, there is no difference. 

Type II Error (false negative): Concluding there is no difference be- 
tween the groups being studied when, in fact, there is a difference. 

Type I andType II errors can be illustrated using the following table: 

Actual Results 



Researcher's Conclusion Difference No Difference 

Difference Correct decision Type I error 

No difference Type II error Correct decision 
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between Type I errors and Type II errors, which means that by decreasing 
the probability of making a Type I error, the researcher is increasing the 
probability of making a Type II error. In other words, if a researcher re- 
duces the probability of making a Type I error from 5% to 1%, there is 
now an increased probability that the researcher will make a Type II error 
by failing to detect a difference that actually exists. The 5% level is a stan- 
dard convention in most fields of research and represents a compromise 
between making Type I and Type II errors. 

Conclusions 

After analyzing the data and determining whether to reject the null hy- 
pothesis, the researcher is now in a position to draw some conclusions 
about the results of the study. For example, if the researcher rejected the 
null hypothesis, the researcher can conclude that the phenomenon being 
studied had an effect — a statistically significant effect, to be more precise. If 
the researcher rejects the null hypothesis in our exercise-cholesterol ex- 
ample, the researcher is concluding that exercise had an effect on levels of 
cholesterol. 

It is important that researchers make only those conclusions that can be 
supported by the data analyses. Going beyond the data is a cardinal sin that 
researchers must be careful to avoid. For example, if a researcher con- 
ducted a correlational study and the results indicated that the two things 
being studied were strongly related, the researcher could not conclude that 
one thing caused the other. An oft-repeated statement that will be ex- 
plained in later chapters is that correlation (i.e., a relationship between two 
things) does not equal causation. In other words, the fact that two things 
are related does not mean that one caused the other. 

Replication 

One of the most important elements of the scientific method is replica- 
tion. Replication essentially means conducting the same research study a 
second time with another group of participants to see whether the same 

term LinG - live, informative, Non-cost and Genuine \ 



INTRODUCTION AND OVERVIEW I 5 



DOK'T FORGET 



Correlation Does Not Equal Causation 

Before looking at an example of why correlation does not equal causa- 
tion, let's make sure that we understand what a correlation is. A correla- 
tion is simply a relationship between two things. For example, size and 
weight are often correlated because there is a relationship between the 
size of something and its weight. Specifically bigger things tend to weigh 
more. The results of correlational studies simply provide researchers with 
information regarding the relationship between two or more variables, 
which may serve as the basis for future studies. It is important, however 
that researchers interpret this relationship cautiously. 

For example, if a researcher finds that eating ice cream is correlated with 
(i.e., related to) higher rates of drowning, the researcher cannot conclude 
that eating ice cream causes drowning. It may be that another variable is 
responsible for the higher rates of drowning. For example, most ice cream 
is eaten in the summer and most swimming occurs in the summerThere- 
fore, the higher rates of drowning are not caused by eating ice cream, but 
rather by the increased number of people who swim during the summer 



results are obtained (see Kazdin, 1992; Shaughnessy & Zechmeister, 
1997). The same researcher may attempt to replicate previously obtained 
results, or perhaps other researchers may undertake that task. Replication 
illustrates an important point about scientific research — namely, that re- 
searchers should avoid drawing broad conclusions based on the results of 
a single research study because it is always possible that the results of that 
particular study were an aberration. In other words, it is possible that the 
results of the research study were obtained by chance or error and, there- 
fore, that the results may not accurately represent the actual state of things. 
However, if the results of a research study are obtained a second time (i.e., 
replicated), the likelihood that the original study's findings were obtained 
by chance or error is greatly reduced. 

The importance of replication in research cannot be overstated. Repli- 
cation serves several integral purposes, including establishing the reliabil- 
ity (i.e., consistency) of the research study's findings and determining 

term LinG - live, informative, Non-cost and Genuine i 



I 6 ESSENTIALS OF RESEARCH DESIGN AND METHODOLOGY 



whether the same results can be obtained with a different group of partic- 
ipants. This last point refers to whether the results of the original study 
are generali^able to other groups of research participants. If the results of 
a study are replicated, the researchers — and the field in which the re- 
searchers work — can have greater confidence in the reliability and gener- 
alizability of the original findings. 

GOALS OF SCIENTIFIC RESEARCH 

As stated previously, the goals of scientific research, in broad terms, are to 
answer questions and acquire new knowledge. This is typically accom- 
plished by conducting research that permits drawing valid inferences 
about the relationship between two or more variables (Kazdin, 1992). In 
later chapters, we discuss the specific techniques that researchers use to 
ensure that valid inferences can be drawn from their research, and in Rapid 
References 1 .6 and 1 .7 we present some research-related terms you should 
become familiar with. For now, however, our main discussion will focus 
on the goals of scientific research in more general terms. Most researchers 
agree that the three general goals of scientific research are description, 
prediction, and understanding/explanation (Cozby, 1993; Shaughnessy & 
Zechmeister, 1997). 



Description 

Perhaps the most basic and easily understood goal of scientific research is 
description. In short, description refers to the process of defining, classify- 
ing, or categorizing phenomena of interest. For example, a researcher may 
wish to conduct a research study that has the goal of describing the rela- 
tionship between two things or events, such as the relationship between 
cardiovascular exercise and levels of cholesterol. Alternatively, a re- 
searcher may be interested in describing a single phenomenon, such as the 
effects of stress on decision making. 

Descriptive research is useful because it can provide important infor- 
mation regarding the average member of a group. Specifically, by gather- 
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Categories of Research 

There are two broad categories of research with which researchers must 
be familiar 

Quantitative vs. Qualitative 

• Quantitative research involves studies that make use of statistical analy- 
ses to obtain their findings. Key features include formal and systematic 
measurement and the use of statistics. 

• Qualitative research involves studies that do not attempt to quantify 
their results through statistical summary or analysis. Qualitative studies 
typically involve interviews and observations without formal measure- 
ment. A case study, which is an in-depth examination of one person, is 
a form of qualitative research. Qualitative research is often used as a 
source of hypotheses for later testing in quantitative research. 

Nomothetic vs. Idiographic 

• The nomothetic approach uses the study of groups to identify general 
laws that apply to a large group of people. The goal is often to identify 
the average member of the group being studied or the average perfor- 
mance of a group member 

• The idiographic approach is the study of an individual. An example of 
the idiographic approach is the aforementioned case study. 

The choice of which research approaches to use largely depends on the 
types of questions being asked in the research study, and different fields of 
research typically rely on different categories of research to achieve their 
goals. Social science research, for example, typically relies on quantitative 
research and the nomothetic approach. In other words, social scientists 
study large groups of people and rely on statistical analyses to obtain their 
findings. These two broad categories of research will be the primary focus 
of this book. 



ing data on a large enough group of people, a researcher can describe the 
average member, or the average performance of a member, of the partic- 
ular group being studied. Perhaps a brief example will help clarify what we 
mean by this. Let's say a researcher gathers Scholastic Aptitude Test (SAT) 
scores from the current freshman class at a prestigious university. By 

term LinG - live, informative, Non-cost and Genuine i 



I 8 ESSENTIALS OF RESEARCH DESIGN AND METHODOLOGY 



ftap/'d Reference /./ 



Sample vs. Population 

Two key terms that you must be familiar with are "sample" and "popula- 
tion.'The population is all individuals of interest to the researcher For ex- 
ample, a researcher may be interested in studying anxiety among lawyers; 
in this example, the population is all lawyers. For obvious reasons, re- 
searchers are typically unable to study the entire population. In this case it 
would be difficult, if not impossible, to study anxiety among all lawyers. 
Therefore, researchers typically study a subset of the population, and that 
subset is called a sample. 

Because researchers may not be able to study the entire population of in- 
terest, it is important that the sample be representative of the population 
from which it was selected. For example, the sample of lawyers the re- 
searcher studies should be similar to the population of lawyers. If the pop- 
ulation of lawyers is composed mainly of White men over the age of 35, 
studying a sample of lawyers composed mainly of Black women under the 
age of 30 would obviously be problematic because the sample is not rep- 
resentative of the population. Studying a representative sample permits 
the researcher to draw valid inferences about the population. In other 
words, when a researcher uses a representative sample, if something is 
true of the sample, it is likely also true of the population. 



using some simple statistical techniques, the researcher would be able to 
calculate the average SAT score for the current college freshman at the 
university. This information would likely be informative for high school 
students who are considering applying for admittance at the university. 
One example of descriptive research is correlational research. In corre- 
lational research (as mentioned earlier), the researcher attempts to determine 
whether there is a relationship — that is, a correlation — between two or 
more variables (see Rapid Reference 1 .8 for two types of correlation). For 
example, a researcher may wish to determine whether there is a relation- 
ship between SAT scores and grade-point averages (GPAs) among a 
sample of college freshmen. The many uses of correlational research will 
be discussed in later chapters. 
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Two Types of Correlation 

Positive correlation: A positive correlation between two variables 
means that both variables change in the same direction (either both in- 
crease or both decrease). For example, if GPAs increase as SAT scores 
increase, there is a positive correlation between SAT scores and GPAs. 

Negative (inverse) correlation: A negative correlation between two 
variables means that as one variable increases, the other variable de- 
creases. In other words, the variables change in opposite directions. So, if 
GPAs decrease as SAT scores increase, there is a negative correlation 
between SAT scores and GPAs. 

Prediction 

Another broad goal of research is prediction. Prediction-based research 
often stems from previously conducted descriptive research. If a re- 
searcher finds that there is a relationship (i.e., correlation) between two 
variables, then it may be possible to predict one variable from knowledge 
of the other variable. For example, if a researcher found that there is a re- 
lationship between SAT scores and GPAs, knowledge of the SAT scores 
alone would allow the researcher to predict the associated GPAs. 

Many important questions in both science and the so-called real world 
involve predicting one thing based on knowledge of something else. For 
example, college admissions boards may attempt to predict success in col- 
lege based on the GPAs and SAT scores of the applicants. Employers may 
attempt to predict job success based on work samples, test scores, and can- 
didate interviews. Psychologists may attempt to predict whether a trau- 
matic life event leads to depression. Medical doctors may attempt to pre- 
dict what levels of obesity and high blood pressure are associated with 
cardiovascular disease and stroke. Meteorologists may attempt to predict 
the amount of rain based on the temperature, barometric pressure, hu- 
midity, and weather patterns. In each of these examples, a prediction is be- 
ing made based on existing knowledge of something else. 
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Understanding/Explanation 

Being able to describe something and having the ability to predict one 
thing based on knowledge of another are important goals of scientific 
research, but they do not provide researchers with a true understanding of 
a phenomenon. One could argue that true understanding of a phenome- 
non is achieved only when researchers successfully identify the cause or 
causes of the phenomenon. For example, being able to predict a student's 
GPA in college based on his or her SAT scores is important and very prac- 
tical, but there is a limit to that knowledge. The most important limitation 
is that a relationship between two things does not permit an inference of 
causality. In other words, the fact that two things are related and knowl- 
edge of one thing (e.g., SAT scores) leads to an accurate prediction of the 
other thing (e.g., GPA) does not mean that one thing caused the other. For 
example, a relationship between SAT scores and freshman GPAs does not 
mean that the SAT scores caused the freshman-year GPAs. More than 
likely, the SAT scores are indicative of other things that may be more 
directly responsible for the GPAs. For example, the students who score 
high on the SAT may also be the students who spend a lot of time study- 
ing, and it is likely the amount of time studying that is the cause of a high 
GPA. 

The ability of researchers to make valid causal inferences is determined 
by the type of research designs they use. Correlational research, as previ- 
ously noted, does not permit researchers to make causal inferences regard- 
ing the relationship between the two things that are correlated. By contrast, 
a randomized controlled study, which will be discussed in detail in Chapter 
5, permits researchers to make valid cause-and-effect inferences. 

There are three prerequisites for drawing an inference of causality be- 
tween two events (see Shaughnessy & Zechmeister, 1997). First, there 
must be a relationship (i.e., a correlation) between the two events. In other 
words, the events must covary — as one changes, the other must also 
change. If two events do not covary, then a researcher cannot conclude 
that one event caused the other event. For example, if there is no relation- 
ship between television viewing and deterioration of eyesight, then one 
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cannot reasonably conclude that television viewing causes a deterioration 
of eyesight. 

Second, one event (the cause) must precede the other event (the effect). 
This is sometimes referred to as a time-order relationship. This should make 
intuitive sense. Obviously, if two events occur simultaneously, it cannot be 
concluded that one event caused the other. Similarly, if the observed effect 
comes before the presumed cause, it would make little sense to conclude 
that the cause caused the effect. 

Third, alternative explanations for the observed relationship must be 
ruled out. This is where it gets tricky. Stated another way, a causal expla- 
nation between two events can be accepted only when other possible 
causes of the observed relationship have been ruled out. An example may 
help to clarify this last required condition for causality. Let's say that a 
researcher is attempting to study the effects of two different psychothera- 
pies on levels of depression. The researcher first obtains a representative 
sample of people with the same level of depression (as measured by a valid 
and reliable measure) and then randomly assigns them to one of two 
groups. Group 1 will get Therapy A and Group 2 will get Therapy B. The 
obvious goal is to compare levels of depression in both groups after pro- 
viding the therapy. It would be unwise in this situation for the researcher 
to assign all of the participants under age 30 to Group 1 and all of the par- 
ticipants over age 30 to Group 2: If, at the conclusion of the study, Group 
1 and Group 2 differed signifi- 



DOK'T FORGET 



candy in levels of depression, the 
researcher would be unable to de- 
termine which variable — type of 

therapy or age — was responsible rerequisites tor 

for the reduced depression. We Inferences of Causality 

would say that this research has • There must be an existing rela- 
been confounded, which means that tionship between two events. 

two variables (in this case, the type * The cause must precede the ef- 

feet 
of therapy and age) were allowed 

, , j-rr n , • Alternative explanations for the 

to vary (or be diirerent) at the . , . . r 

relationship must be ruled out. 
same time. Ideally, only the vari- 
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able being studied (e.g., the type of therapy) will differ between the two 
groups. 

OVERVIEW OF THE BOOK 

The focus of this book is, obviously, research design and methodology. 
Although these terms are sometimes incorrectly used interchangeably, 
they are distinct concepts with well-defined and circumscribed meanings. 
Therefore, before proceeding any further, it would behoove us to define 
these terms, at least temporarily. As defined by Kazdin (1992, 2003a), a 
recognized leader in the field of research, methodology refers to the prin- 
ciples, procedures, and practices that govern research, whereas research de- 
sign refers to the plan used to examine the question of interest. "Method- 
ology" should be thought of as encompassing the entire process of 
conducting research (i.e., planning and conducting the research study, 
drawing conclusions, and disseminating the findings). By contrast, "re- 
search design" refers to the many ways in which research can be con- 
ducted to answer the question being asked. These concepts will become 
clearer throughout this book, but it is important that you understand the 
focus of this book before reading any further. 

Essentials oj Research Design and Methodology succinctly covers all of the 
major topic areas within research design and methodology. Each chapter 
in this book covers a specific research-related topic using easy-to- 
understand language and illustrative examples. The book is not meant, 
however, to replace the very extensive and comprehensive coverage of re- 
search issues that can be found in other publications. For those readers 
who would like a more in-depth understanding of the specific topic areas 
covered in this book, we would suggest looking to the publications in- 
cluded in the reference list at the end of this book. Finally, although each 
chapter builds upon the knowledge obtained from the previous chapters, 
each chapter can also be used as a stand-alone summary of the important 
points within that topic area. For this reason, we occasionally cover some 
of the same material in more than one chapter. 

The chapters in Essentials oj Research Design and Methodology are organized 
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in a manner that accurately reflects the logical flow of a research project 
from development to conclusion. The first three chapters lay the founda- 
tion for conducting a research project. This chapter introduced you to 
some of the key concepts relating to science, research design, and method- 
ology. As will be discussed, at a basic level, the first step in conducting 
research involves coming up with an idea and translating that idea into a 
testable question or statement. Chapter 2 discusses these preliminary 
stages of research, including choosing a research idea, formulating a re- 
search problem, choosing appropriate independent and dependent vari- 
ables, and selecting a sample of participants for your study. As every re- 
searcher knows, coming up with a well-designed research study can be a 
challenging process, but the importance of that task cannot be overstated. 
Chapter 3 discusses some of the more common pitfalls faced by re- 
searchers when thinking about the design of a research study. 

After a research question has been formulated, researchers must 
choose a research design, collect and analyze the data, and draw some con- 
clusions. Chapter 4 will introduce you to the common measurement issues 
and strategies that must be considered when designing a research study. 
Chapter 5 will present a concise summary of the most common types of 
research designs that are available to researchers; as will be discussed, the 
type of research design chosen for a particular study depends largely on 
the question being asked. Chapter 6 will focus on one of the most impor- 
tant considerations in all of research — validity. Put simply, validity refers to 
the soundness of the research design being used, with high validity typi- 
cally producing more accurate and meaningful results. Validity comes in 
many forms, and Chapter 6 will discuss each one and how to maximize it 
in the course of research. Chapter 7 will introduce you to many of the is- 
sues faced by researchers when analyzing data and attempting to draw 
conclusions based on the data. 

Most research is subject to oversight by one or more ethical review 
committees, such as a university-based institutional review board. These 
committees are charged with the important task of reviewing all proposed 
research studies to ensure that they comply with applicable regulations 
governing research, which may be established by the university, the city, 
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the state, or the federal government, depending on the nature of the re- 
search being conducted. Knowledge of the commonly encountered ethi- 
cal issues will assist researchers in avoiding ethical violations and resolving 
ethical dilemmas. To this end, Chapter 8 will focus on the most commonly 
encountered ethical issues faced by researchers when designing and con- 
ducting a research study. Among other things, Chapter 8 will focus on the 
important topic of informed consent to research. 

Finally, Chapter 9 will present a brief section on the dissemination of 
research results, including publication in peer-reviewed journals and pre- 
sentations at professional conferences. Chapter 9 will include a distillation 
of major principles of research design and methodology that are appli- 
cable for those conducting research in a variety of capacities and settings. 
Chapter 9 will conclude by presenting a checklist of the major research- 
related concepts and considerations covered throughout this book. 

Before concluding this chapter, one word of caution is necessary re- 
garding the focus of this book. As stated previously, research studies come 
in many different forms, depending on the scientific discipline within 
which the research is being conducted. For example, most research stud- 
ies in the field of quantum physics take place in a laboratory and do not in- 
volve human participants. Contrast this with the research studies that are 
conducted by social scientists, which may often take place in real-world 
settings and involve human participants. For the sake of clarity, consis- 
tency, and ease of reading, we thought that it was necessary to narrow the 
focus of this book to one broad type of research. Therefore, throughout 
this book, we will focus primarily on empirical research involving human 
participants, which is most commonly found in the social and behavioral 
sciences. Focusing on this type of research permits us to explore a wider 
range of research-related considerations that must be addressed by re- 
searchers across many scientific disciplines. 
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JSfr TEST YOURSELF ^u 



1 . can be defined as a methodological and systematic ap- 
proach to the acquisition of new knowledge. 

2. The defining characteristic of scientific research is the 



3. The approach relies on direct observation and experimen- 
tation in the acquisition of new knowledge. 

4. Scientists define key concepts and terms in the context of their research 
studies by using definitions. 

5. What are the three general goals of scientific research? 

Answers: I . Science; 2. scientific method; 3. empirical; 4. operational; 5. description, predic- 
tion, and understanding/explaining 
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Two 

PLANNING AND DESIGNING A 
RESEARCH STUDY 



As discussed in Chapter 1, engaging in research can be an exciting 
and rewarding endeavor. Through research, scientists attempt to 
answer age-old questions, acquire new knowledge, describe how 
things work, and ultimately improve the way we all live. Despite the excit- 
ing and rewarding nature of research, deciding to conduct a research study 
can be intimidating for both inexperienced and experienced researchers 
alike. Novice researchers are frequently surprised — and often over- 
whelmed — by the sheer number of decisions that need to be made in the 
context of a research study. Depending on the scope and complexity of the 
research study being considered, there are typically dozens of research- 
related issues that need to be addressed in the planning stage alone. As a 
result, the early stages of planning a research study can often seem over- 
whelming for novice researchers with little experience (and even for sea- 
soned researchers with considerable experience, although they may not 
always freely admit it). 

As will become clear throughout this chapter, much of the work in- 
volved in conducting a research study actually takes place prior to con- 
ducting the study itself. All too often, novice researchers underestimate 
the amount of preparatory groundwork that needs to be accomplished 
prior to collecting any data. Although the preliminary work of getting a re- 
search study started differs depending on the type of research being con- 
ducted, there are some research-related issues that are common to most 
types of research. For example, prior to collecting any data at all, re- 
searchers must typically identify a topic area of interest, conduct a litera- 

26 
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ture review, formulate a researchable question, articulate hypotheses, de- 
termine who or what will be studied, identify the independent and depen- 
dent variables that will be examined in the study, and choose an appropri- 
ate research methodology And these are just a few of the more common 
research-related issues encountered by researchers. Furthermore, de- 
pending on the context in which the research is taking place, there may be 
a push to get the research study started sooner rather than later, which may 
further contribute to the researcher's feeling overwhelmed during the 
planning stage of a research study 

In addition to these research-related issues, researchers may also need 
to consider several logistical and administrative issues. Administrative and 
logistical issues include things such as who is paying for the research, 
whether research staff need to be hired, where and when the research 
study will be conducted, and what approvals need to be obtained (and 
from whom) to conduct the research study. And this is just a small sam- 
pling of the preliminary issues that researchers need to address during the 
planning stage of a research study. 

The purpose of this chapter is to introduce you to this planning stage. 
Because research studies differ greatly, both in terms of scope and con- 
tent, this chapter cannot possibly address all of the issues that need to be 
considered when planning and designing a research study. Instead, this 
chapter will focus on the research-related issues that are most commonly 
encountered by researchers in all scientific fields (particularly those that 
involve human participants) when planning and designing a research 
study. In some ways, you can think of this chapter as a checklist of the ma- 
jor research-related issues that need to be considered during the planning 
stage. Although some of the topics discussed in this chapter may not be 
applicable in the context of your particular research, it is important for you 
to be aware of these issues. After discussing how researchers typically se- 
lect the topics that they study, this chapter will discuss literature reviews, 
the formulation of research problems, the development of testable hy- 
potheses, the identification and operationalization of independent and de- 
pendent variables, and the selection and assignment of research partici- 
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pants. Finally, this chapter will conclude with a discussion of the impact of 
multicultural issues on research. 



CHOOSING A RESEARCH TOPIC 

The first step in designing any research study is deciding what to study. 
Researchers choose the topics that they study in a variety of ways, and their 
decisions are necessarily influenced by several factors. For example, 
choosing a research topic will obviously be largely influenced by the sci- 
entific field within which the researcher works. As you know, "science" is 
a broad term that encompasses numerous specialized and diverse areas of 
study, such as biology, physics, psychology, anthropology, medicine, and 
economics, just to name a few. Researchers achieve competence in their 
particular fields of study through a combination of training and experi- 
ence, and it typically takes many years to develop an area of expertise. 

As you can probably imagine, it would be quite difficult for a researcher 
in one scientific field to undertake a research study involving a topic in an 
entirely different scientific field. For example, it is highly unlikely that a 
botanist would choose to study quantum physics or macroeconomics. In 
addition to his or her lacking the training and experience necessary for 
studying quantum physics or macroeconomics, it is probably reasonable 
to conclude that the botanist does not have an interest in conducting 
research studies in those areas. So, assuming that researchers have the 
proper training and experience to conduct research studies in their re- 
spective fields, let's turn our attention to how researchers choose the top- 
ics that they study (see Christensen, 2001; Kazdin, 1992). 

Interest 

First and foremost, researchers typically choose research topics that are of 
interest to them. Although this may seem like common sense, it is impor- 
tant to occasionally remind ourselves that researchers engage in research 
presumably because they have a genuine interest in the topics that they 
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study. A good question to ask at this point is how research interests de- 
velop in the first place. There are several answers to this question. 

Many researchers entered their chosen fields of study with long- 
standing interests in those particular fields. For example, a psychologist 
may have decided to become a researcher because of a long-standing in- 
terest in how childhood psychopathology develops or how anxiety disor- 
ders can be effectively treated with psychotropic medications. For other 
researchers, they may have entered their chosen fields of study with spe- 
cific interests, and then perhaps refined those interests over the course of 
their careers. Further, as many researchers will attest, it is certainly not 
uncommon for researchers to develop new interests throughout their 
careers. Through the process of conducting research, as well as the long 
hours that are spent reviewing other people's research, researchers can 
often stumble onto new and often unanticipated research ideas. 

Regardless of whether researchers enter their chosen fields with spe- 
cific interests or develop new interests as they go along, many researchers 
become interested in particular research ideas simply by observing the 
world around them (as discussed in Chapter 1). Merely taking an interest 
in a specific observed phenomenon is the impetus for a great amount of 
research in all fields of study. In summary, a researcher's basic curiosity 
about an observed phenomenon typically provides sufficient motivation 
for choosing a research topic. 

Problem Solving 

Some research ideas may also stem from a researcher's motivation to solve 
a particular problem. In both our private and professional lives, we have 
probably all come across some situation or thing that has caught our at- 
tention as being in need of change or improvement. For example, a great 
deal of research is currently being conducted to make work environments 
less stressful, diets healthier, and automobiles safer. In each of these re- 
search studies, researchers are attempting to solve some specific problem, 
such as work-related stress, obesity, or dangerous automobiles. This type 
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of problem-solving research is often conducted in corporate and profes- 
sional settings, primarily because the results of these types of research 
studies typically have the added benefit of possessing practical utility. For 
example, finding ways for employers to reduce the work-related stress of 
employees could potentially result in increased levels of employee pro- 
ductivity and satisfaction, which in turn could result in increased eco- 
nomic growth for the organization. These types of benefits are likely to be 
of great interest to most corporations and businesses. 

Previous Research 

Researchers also choose research topics based on the results of prior re- 
search, whether conducted by them or by someone else. Researchers will 
likely attest that previously conducted research is a rich and plentiful 
source of research ideas. Through exposure to the results of research stud- 
ies, which are typically published in peer-reviewed journals (see Chapter 9 
for a discussion of publishing the results of research studies), a researcher 
may develop a research interest in a particular area. For example, a sociol- 
ogist who primarily studies the socialization of adolescents may take an in- 
terest in studying the related phenomenon of adolescent gang behavior 
after being exposed to research studies on that topic. In these instances, 
researchers may attempt to replicate the results obtained by the other re- 
searchers or perhaps extend the findings of the previous research to dif- 
ferent populations or settings. As noted by Kazdin (1992), a large portion 
of research stems from researchers' efforts to build upon, expand, or re- 
explain the results of previously conducted research studies. In fact, it is 
often quipped that "research begets research," primarily because research 
tends to raise more questions than it answers, and those newly raised ques- 
tions often become the focus of future research studies. 

Theory 

Finally, theories (see Rapid Reference 2.1 for a definition) often serve as a 
good source for research ideas. Theories can serve several purposes, but 
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cally function as a rich source of ^= nup/u ftc/c/'cflCc Z. / 

hypotheses that can be examined 
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important point that should not A theory is a conceptualization, or 

be glossed over — specifically, that description, of a phenomenon that 

research ideas (and the hypothe- attempts to integrate all that we 
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& a concise statement or question. 

low from those ideas) should be 
based on some theory (Serlin, 

1987). For example, a researcher may have a theory regarding the devel- 
opment of depression among elderly males. In this example, the re- 
searcher may theorize that elderly males become depressed due to their 
reduced ability to engage in enjoyable physical activities. This hypothetical 
theory, like most other theories, makes a prediction. In this instance, the 
theory makes a specific prediction about what causes depression among 
elderly males. The predictions suggested by theories can often be trans- 
formed into testable hypotheses that can then be examined empirically in 
the context of a research study. 

In the preceding paragraphs, we have only briefly touched upon several 
possible sources for research ideas. There are obviously many more 
sources we could have discussed, but space limitations preclude us from 
entering into a full discourse on this topic. The important point to re- 
member from this discussion is that research ideas can — and do — come 
from a variety of different sources, many of which we commonly en- 
counter in our daily lives. 

Throughout this discussion, you may have noticed that we have not 
commented on the quality of the research idea. Instead, we have limited 
our discussion thus far to how researchers choose research ideas, and not 
to whether those ideas are good ideas. There are many situations, however, 
in which the quality of the research idea is of paramount importance. For 
example, when submitting a research proposal as part of a grant applica- 
tion, the quality of the research idea is an important consideration in the 
funding decision. Although judging whether a research idea is good may 
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appear to be somewhat subjective, there are some generally accepted cri- 
teria that can help in this determination. Is the research idea creative? Will 
the results of the research study make a valuable and significant contribu- 
tion to the literature or practice in a particular field? Does the research 
study address a question that is considered important in the field? Ques- 
tions like these can often be answered by looking through the existing lit- 
erature to see how the particular research study fits into the bigger picture. 
So, let's turn our attention to the logical next step in the planning phase of 
a research study: the literature review. 

LITERATURE REVIEW 

Once a researcher has chosen a specific topic, the next step in the planning 
phase of a research study is reviewing the existing literature in that topic 
area. If you are not yet familiar with the process of conducting a literature 
review, it simply means becoming familiar with the existing literature (e.g., 
books, journal articles) on a particular topic. Obviously, the amount of 
available literature can differ significantly depending on the topic area be- 
ing studied, and it can certainly be a time-consuming, arduous, and diffi- 
cult process if there has been a great deal of research conducted in a par- 
ticular area. Ask any researcher (or research assistant) about conducting 
literature reviews and you will likely encounter similar comments about 
the length of time that is spent looking for literature on a particular topic. 
Fortunately, the development of comprehensive electronic databases 
has facilitated the process of conducting literature reviews. In the past few 
years, individual electronic databases have been developed for several spe- 
cific fields of study. For example, medical researchers can access existing 
medical literature through Medline; social scientists can use PsychlNFO 
(see Rapid Reference 2.2) or PsychLIT; and legal researchers can use West- 
law or Lexis. Access to most of these electronic database services is re- 
stricted to individuals with subscriptions or to those who are affiliated 
with university-based library systems. Although gaining access to these 
services can be expensive, the advent of these electronic databases has 
made the process of conducting thorough literature reviews much easier 
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areas. For example, if a researcher decides to investigate the onset of dia- 
betes among the elderly, it would be important for him or her to have an 
understanding of the current state of the knowledge in that area. 

Literature reviews are absolutely indispensable when planning a re- 
search study because they can help guide the researcher in an appropriate 
direction by answering several questions related to the topic area. Have 
other researchers done any work in this topic area? What do the results of 
their studies suggest? Did previous researchers encounter any unforeseen 
methodological difficulties of which future researchers should be aware 
when planning or conducting studies? Does more research need to be 
conducted on this topic, and if so, in what specific areas? A thorough lit- 
erature review should answer these and related questions, thereby helping 
to set the stage for the research being planned. 

Often, the results of a well-conducted literature review will reveal that 
the study being planned has, in fact, already been conducted. This would 
obviously be important to know during the planning phase of a study, and 
it would certainly be beneficial to be aware of this fact sooner rather than 
later. Other times, researchers may change the focus or methodology of 
their studies based on the types of studies that have already been con- 
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DOK'T FORGET 



Literature Reviews 

Scouring the existing literature to get ideas for future research is a tech- 
nique used by most researchers. It is important to note, however, that be- 
ing familiar with the literature in a particular topic area also serves an- 
other purpose. Specifically, it is crucial for researchers to know what types 
of studies have been conducted in particular areas so they can determine 
whether their specific research questions have already been answered. To 
be clear it is certainly a legitimate goal of research to replicate the results 
of other studies — but there is a difference between replicating a study for 
purposes of establishing the robustness orgeneralizability of the original 
findings and simply duplicating a study without having any knowledge that 
the same study has already been conducted. You can often save yourself a 
good deal of time and money by simply looking to the literature to see 
whether the study you are planning has already been conducted. 



ducted. Literature reviews can often be intimidating for novice re- 
searchers, but like most other things relating to research, they become eas- 
ier as you gain experience. 

FORMULATING A RESEARCH PROBLEM 

After selecting a specific research topic and conducting a thorough litera- 
ture review, you are ready to take the next step in planning a research study: 
clearly articulating the research problem. The research problem (see Rapid 
Reference 2.3) typically takes the form of a concise question regarding the 
relationship between two or more variables. Examples of research prob- 
lems include the following: (1) Is the onset of depression among elderly 
males related to the development of physical limitations? (2) What effect 
does a sudden dip in the Dow Jones Industrial Average have on the econ- 
omy of small businesses? (3) Will a high-fiber, low- fat diet be effective in 
reducing cholesterol levels among middle-aged females? (4) Can a mem- 
ory enhancement class improve the memory functioning of patients with 
progressive dementia? 
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vague and nonspecific research questions: (1) What effect does weather 
have on memory? (2) Does exercise improve physical and mental health? 
(3) Does taking street drugs result in criminal behavior? As you can see, 
each of these questions is rather vague, and it is impossible to determine 
exactly what is being studied. For example, in the first question, what type 
of weather is being studied, and memory for what? In the second question, is 
the researcher studying all types of exercise, and the effects of exercise on 
the physical and mental health of all people or a specific subgroup of 
people? Finally, in the third question, which street drugs are being studied, 
and what specific types oi criminal behavior? 

An effective way to avoid confusion in formulating research questions 
is by using operational definitions. Through the use of operational defini- 
tions, researchers can specifically and clearly identify what (or who) is 
being studied (see Kazdin, 1992). As briefly discussed in Chapter 1, re- 
searchers use operational definitions to define key concepts and terms in 
the specific contexts of their research studies. The benefit of using opera- 
tional definitions is that they help to ensure that everyone is talking about 
the same phenomenon. Among other things, this will greatly assist future 
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researchers who attempt to replicate a given study's results. Obviously, if 
researchers cannot determine what or whom is being studied, they will 
certainly not be able to replicate the study. Let's look at an example of how 
operational definitions can be effectively used when formulating a re- 
search question. 

Let's say that a researcher is interested in studying the effects of large 
class sizes on the academic performance of gifted children in high- 
population schools. The research question may be phrased in the follow- 
ing manner: "What effects do large class sizes have on the academic per- 
formance of gifted children in high-population schools?" This may seem 
to be a fairly straightforward research question, but upon closer examina- 
tion, it should become evident that there are several important terms and 
concepts that need to be defined. For example, what constitutes a "large 
class"; what does "academic performance" refer to; which kids are con- 
sidered "gifted"; and what is meant by "high-population schools"? 

To reduce confusion, the terms and concepts included in the research 
question need to be clarified through the use of operational definitions. 
For example, "large classes" may be defined as classes with 30 or more stu- 
dents; "academic performance" may be limited to scores received on stan- 
dardized achievement tests; "gifted" children may include only those chil- 



DOK'T FORGET 

Operational Definitions 

An important point to keep in mind is that an operational definition is 
specific to the particular study in which it is used. Although researchers 
can certainly use the same operational definitions in different studies 
(which facilitates replication of the study results), different studies can op- 
erationally define the same terms and concepts in different ways. For ex- 
ample, in one study, a researcher may define "gifted children" as those 
children who are in advanced classes. In another study, however, "gifted 
children" may be defined as children with IQs of I 30 or higherThere is 
no one correct definition of "gifted children," but providing an operational 
definition reduces confusion by specifying what is being studied. 
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dren who are in advanced classes; and "high-population schools" may be 
defined as schools with more than 1,000 students. Without operationally 
defining these key terms and concepts, it would be difficult to determine 
what exactly is being studied. Further, the specificity of the operational de- 
finitions will allow future researchers to replicate the research study. 

ARTICULATING HYPOTHESES 

The next step in planning a research study is articulating the hypotheses 
that will be tested. This is yet another step in the planning phase of a 
research study that can be somewhat intimidating for inexperienced re- 
searchers. Articulating hypotheses is truly one of the most important steps 
in the research planning process, because poorly articulated hypotheses 
can ruin what may have been an otherwise good study. The following dis- 
cussion regarding hypotheses can get rather complicated, so we will at- 
tempt to keep the discussion relatively short and to the point. 

As briefly discussed in Chapter 1, hypotheses attempt to explain, predict, 
and explore the phenomenon of interest. In many types of studies, this 
means that hypotheses attempt to explain, predict, and explore the rela- 
tionship between two or more variables (Kazdin, 1992; see Christensen, 
2001). To this end, hypotheses can be thought of as the researcher's edu- 
cated guess about how the study will turn out. As such, the hypotheses 
articulated in a particular study should logically stem from the research 
problem being investigated. 

Before we discuss specific types of hypotheses, there are two important 
points that you should keep in mind. First, all hypotheses must hcfalsifi- 
able. That is, hypotheses must be capable of being refuted based on the re- 
sults of the study (Christensen, 2001). This point cannot be emphasized 
enough. Put simply, if a researcher's hypothesis cannot be refuted, then the 
researcher is not conducting a scientific investigation. Articulating hy- 
potheses that are not falsifiable is one sure way to ruin what could have 
otherwise been a well-conducted and important research study. Second, as 
briefly discussed in Chapter 1, a hypothesis must make a prediction (usually 
about the relationship between two or more variables). The predictions 
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embodied in hypotheses are subsequendy tested empirically by gathering 
and analyzing data, and the hypotheses can then be either supported or 
refuted. 

Now that you have been introduced to the topic of hypotheses, we 
should turn our attention to specific types of hypotheses. There are two 
broad categories of hypotheses with which you should be familiar. 

Null Hypotheses and Alternate Hypotheses 

The first category of research hypotheses, which was briefly discussed in 
Chapter 1, includes the null hypothesis and the alternate (or experimental) hy- 
pothesis. In research studies involving two groups of participants (e.g., ex- 
perimental group vs. control group), the null hypothesis always predicts 
that there will be no differences between the groups being studied 
(Kazdin, 1992). If, however, a particular research study does not involve 
groups of study participants, but instead involves only an examination of 
selected variables, the null hypothesis predicts that there will be no rela- 
tionship between the variables being studied. By contrast, the alternate 
hypothesis always predicts that there will be a difference between the 
groups being studied (or a relationship between the variables being stud- 
ied). 

Let's look at an example to clarify the distinction between null hy- 
potheses and alternate hypotheses. In a research study investigating the ef- 
fects of a newly developed medication on blood pressure levels, the null 
hypothesis would predict that there will be no difference in terms of blood 
pressure levels between the group that receives the medication (i.e., the 
experimental group) and the group that does not receive the medication 
(i.e., the control group). By contrast, the alternate hypothesis would pre- 
dict that there will be a difference between the two groups with respect to 
blood pressure levels. So, for example, the alternate hypothesis may pre- 
dict that the group that receives the new medication will experience a 
greater reduction in blood pressure levels than the group that does not re- 
ceive the new medication. 

It is not uncommon for research studies to include several null and al- 
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ternate hypotheses. The number of null and alternate hypotheses included 
in a particular research study depends on the scope and complexity of the 
study and the specific questions being asked by the researcher. It is im- 
portant to keep in mind that the number of hypotheses being tested has 
implications for the number of research participants that will be needed to 
conduct the study. This last point rests on rather complex statistical con- 
cepts that we will not discuss in this section. For our purposes, it is suffi- 
cient to remember that as the number of hypotheses increases, the num- 
ber of required participants also typically increases. 

In scientific research, keep in mind that it is the null hypothesis that is 
tested, and then the null hypothesis is either confirmed or refuted (sometimes 
phrased as rejected or not rejected). Remember, if the null hypothesis is re- 
jected (and that decision is based on the results of statistical analyses, 
which will be discussed in later chapters), the researcher can reasonably 
conclude that there is a difference between the groups being studied (or a 
relationship between the variables being studied). Rejecting the null hy- 
pothesis allows a researcher to not reject the alternate hypothesis, and not 
rejecting a hypothesis is the most we can do in scientific research. To be 
clear, we can never accepts, hypothesis; we can only fail to rejects, hypothesis 
(as was briefly discussed in Chapter 1). Accordingly, researchers typically 
seek to reject the null hypothesis, which empirically demonstrates that the 
groups being studied differ on the variables being examined in the study. 
This last point may seem counterintuitive, but it is an extremely important 
concept that you should keep in mind. 

Directional Hypotheses and Nondirectional Hypotheses 

The second category of research hypotheses includes directional hy- 
potheses and nondirectional hypotheses. In research studies involving 
groups of study participants, the decision regarding whether to use a di- 
rectional or a nondirectional hypothesis is based on whether the re- 
searcher has some idea about how the groups being studied will differ. 
Specifically, researchers use nondirectional hypotheses when they believe that 
the groups will differ, but they do not have a belief regarding how the 
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groups will differ (i.e., in which direction they will differ). By contrast, re- 
searchers use directional hypotheses when they believe that the groups being 
studied will differ, and they have a belief regarding how the groups will dif- 
fer (i.e., in a particular direction). 

A simple example should help clarify the important distinction between 
directional and nondirectional hypotheses. Let's say that a researcher is 
using a standard two-group design (i.e., one experimental group and one 
control group) to investigate the effects of a memory enhancement class 
on college students' memories. At the beginning of the study, all of the 
study participants are randomly assigned to one of the two groups. (We 
will talk about the important concept of random assignment later in this 
chapter and in Chapter 3, and about the concept of informed consent — 
which we mention briefly in Rapid Reference 2 A — in Chapter 8.) Subse- 
quently, one group (i.e., the experimental group) will be exposed to the 
memory enhancement class and the other group (i.e., the control group) 
will not be exposed to the memory enhancement class. Afterward, all of 
the participants in both groups will be administered a memory test. Based 
on this research design, any observed differences between the two groups 
on the memory test can reasonably be attributed to the effects of the 
memory enhancement class. 



ftap/'d Reference 2.4 



Informed Consent 

Priorto your collecting any data from study participants, the participants 
must voluntarily agree to participate in the study. Through a process called 
informed consent, all potential study participants are informed about the 
procedures that will be used in the study the risks and benefits of partici- 
pating in the study and their rights as study participants. There are, how- 
ever; a few limited instances in which researchers are not required to ob- 
tain informed consent from the study participants, and it is therefore 
important that researchers become knowledgeable about when informed 
consent is required. The topic of informed consent will be discussed in de- 
tail in Chapter 8. 
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Nondirectional Hypotheses vs. Directional Hypotheses 

A reliable way to tell the difference between directional and nondirec- 
tional hypotheses is to look at the wording of the hypotheses. If the hy- 
pothesis simply predicts that there will be a difference between the two 
groups, then it is a nondirectional hypothesis. It is nondirectional because 
it predicts that there will be a difference but does not specify how the 
groups will differ If, however the hypothesis uses so-called comparison 
terms, such as"greater,""less,""better,"or"worse,"then it is a directional 
hypothesis. It is directional because it predicts that there will be a differ- 
ence between the two groups and it specifies how the two groups will 
differ 



In this example, the researcher has several options in terms of hy- 
potheses. On the one hand, the researcher may simply hypothesize that 
there will be a difference between the two groups on the memory test. 
This would be an example of a nondirectional hypothesis, because the re- 
searcher is hypothesizing that the two groups will differ, but the researcher 
is not specifying how the two groups will differ. Alternatively, the re- 
searcher could hypothesize that the participants who are exposed to the 
memory enhancement class will perform better on the memory test than 
the participants who are not exposed to the memory enhancement class. 
This would be an example of a directional hypothesis, because the re- 
searcher is hypothesizing that the two groups will differ and specifying how 
the two groups will differ (i.e., one group will perform better than the 
other group on the memory test). See Rapid Reference 2.5 for a tip on how 
to distinguish between directional and nondirectional hypotheses. 

CHOOSING VARIABLES TO STUDY 

We are now very close to beginning the actual study, but there are still a few 
things remaining to do before we begin collecting data. Before proceeding 
any further, it would probably be helpful for us to take a moment and see 
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late concise research problems 
with clearly defined concepts and 
terms (using operational defini- 
tions); and (4) articulate falsifiable hypotheses. We have certainly accom- 
plished quite a bit, but there is still a little more to do before beginning the 
study itself. 

The next step in planning a research study is identifying what variables 
(see Rapid Reference 2.6) will be the focus of the study. There are many 
categories of variables that can appear in research studies. However, rather 
than discussing every conceivable one, we will focus our attention on the 
most commonly used categories. Although not every research study will 
include all of these variables, it is important that you are aware of the dif- 
ferences among the categories and when each type of variable may be 
used. 

Independent Variables vs. DependentVariables 

When discussing variables, perhaps the most important distinction is be- 
tween independent and dependent variables. The independent variable is the 
factor that is manipulated or controlled by the researcher. In most studies, 
researchers are interested in examining the effects of the independent 
variable. In its simplest form, the independent variable has two levels: pre- 
sent or absent. For example, in a research study investigating the effects of 
a new type of psychotherapy on symptoms of anxiety, one group will be 
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exposed to the psychotherapy and one group will not be exposed to the 
psychotherapy. In this example, the independent variable is the psycho- 
therapy, because the researcher can control whether the study participants 
are exposed to it and the researcher is interested in examining the effects 
of the psychotherapy on symptoms of anxiety. As you may already know, 
the group in which the independent variable is present (i.e., that is exposed 
to the psychotherapy) is referred to as the experimental group, whereas the 
group in which the independent variable is not present (i.e., that is not ex- 
posed to the psychotherapy) is referred to as the control group. 

Although, in its simplest form, an independent variable has only two 
levels (i.e., present or absent), it is certainly not uncommon for an inde- 
pendent variable to have more than two levels. For example, in a research 
study examining the effects of a new medication on symptoms of depres- 
sion, the researcher may include three groups in the study — one control 
group and two experimental groups. As usual, the control group would 
not get the medication (or would get a placebo), while one experimental 
group may get a lower dose of the medication and the other experimental 
group may get a higher dose of the medication. In this example, the inde- 
pendent variable (i.e., medication) consists of three levels: absent, low, and 
high. Other levels of independent variables are, of course, also possible, 
such as low, medium, and high; or absent, low, medium, and high. Re- 
searchers make decisions regarding the number of levels of an indepen- 
dent variable based on a careful consideration of several factors, including 
the number of available study participants, the degree of specificity of re- 
sults they desire to achieve with the study, and the associated financial 
costs. 

It is also common for a research study to include multiple independent 
variables, perhaps with each of the independent variables consisting of 
multiple levels. For example, a researcher may attempt to investigate the 
effects of both medication and psychotherapy on symptoms of depres- 
sion. In this example, there are two independent variables (i.e., medication 
and psychotherapy), and each independent variable could potentially con- 
sist of multiple levels (e.g., low, medium, and high doses of medication; 
cognitive behavioral therapy, psychodynamic therapy, and rational emo- 
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tive therapy). As you can see, things have a tendency to get complicated 
fairly quickly when researchers use multiple independent variables with 
multiple levels. 

At this point in the discussion, you should be actively resisting the urge 
to be intimidated by the material presented so far in this chapter. We have 
covered quite a bit of information, and it is getting more complicated as 
we go. Keeping track of the different categories and types of variables can 
certainly be difficult, even for those of us with considerable research ex- 
perience. If you are getting confused, it may be helpful to reduce things to 
their simplest terms. In the case of independent variables, the important 
point to keep in mind is that researchers are interested in examining the ef- 
fects of an independent variable on something, and that something is the 
dependent variable (Isaac & Michael, 1997). Let's now turn our attention 
to dependent variables. 

The dependent variable is a measure of the effect (if any) of the indepen- 
dent variable. For example, a researcher may be interested in examining 
the effects of a new medication on symptoms of depression among col- 
lege students. In this example, prior to administering any medication, the 
researcher would most likely administer a valid and reliable measure of de- 
pression — such as the Beck Depression Inventory (Beck, Ward, Mendel- 
son, Mock, & Erbaugh, 1 961) — to a group of study participants. The Beck 
Depression Inventory is a well-accepted self-report inventory of symp- 
toms of depression. Administering a measure of depression to the study 
participants prior to administering any medication allows the researcher to 
obtain what is called a baseline measure of depression, which simply means 
a measurement of the levels of depression that are present prior to the ad- 
ministration of any intervention (e.g., psychotherapy, medication). The re- 
searcher then randomly assigns the study participants to two groups, an 
experimental group that receives the new medication and a control group 
that does not receive the new medication (perhaps its members are ad- 
ministered a placebo). 

After administering the medication (or not administering the medica- 
tion, for the control group), the researcher would then readminister the 
Beck Depression Inventory to all of the participants in both groups. The 

term LinG - live, informative, Non-cost and Genuine i 



PLANNING AND DESIGNING A RESEARCH STUDY 45 



researcher now has two Beck Depression Inventory scores for each of the 
participants in both groups — one score from before the medication was 
administered and one score from after the medication was administered. 
(By the way, this type of research design is referred to as a pre /post design, 
because the dependent variable is measured both before and after the in- 
tervention is administered. We will talk about this type of research design 
in Chapter 5.) These two depression scores can then be compared to de- 
termine whether the medication had any effect on the levels of depression. 
Specifically, if the scores on the Beck Depression Inventory decrease 
(which indicates lower levels of depression) for the participants in the ex- 
perimental group, but not for the participants in the control group, then 
the researcher can reasonably conclude that the medication was effective 
in reducing symptoms of depression. To be more precise, for the re- 
searcher to conclude that the medication was effective in reducing symp- 
toms of depression, there would need to be a statistically significant difference 
in Beck Depression Inventory scores between the experimental group and 
the control group, but we will put that point aside for the moment. 

Before proceeding any further, take a moment and see whether you can 
identify the independent and dependent variables in our example. Have 
you figured it out? In this example, the new medication is the independent 
variable because it is under the researcher's control and the researcher is 
interested in measuring its effect. The Beck Depression Inventory score is 
the dependent variable because it is a measure of the effect of the inde- 
pendent variable. 

When students are exposed to research terminology for the first time, 
it is not uncommon for them to confuse the independent and dependent 
variables. Fortunately, there is an easy way to remember the difference be- 
tween the two. If you get confused, think of the independent variable as 
the "cause" and the dependent variable as the "effect." To assist you in this 
process, it may be helpful if you practice stating your research question 

in the following manner: "What are the effects of on 

?" The first blank is the independent variable and the second 

blank is the dependent variable. For example, we may ask the following re- 
search question: "What are the effects of exercise on levels of body fat ?" 
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Independent Variables and Dependent Variables 

The independent variable is called "independent" because it is indepen- 
dent of the outcome being measured. More specifically, the independent 
variable is what causes or influences the outcome. The dependent variable 
is called "dependent" because it is influenced by the independent variable. 
For example, in our hypothetical study examining the effects of medica- 
tion on symptoms of depression, the measure of depression is the depen- 
dent variable because it is influenced by (i.e., is dependent on) the inde- 
pendent variable (i.e., the medication). 



In this example, "exercise" is the independent variable and "levels of body 
fat" is the dependent variable. Rapid Reference 2.7 summarizes the dis- 
tinction between the two; and Rapid Reference 2.8 uses this distinction to 
further our understanding of the term "research." 

Now that we know the differ- 
ence between independent and 
dependent variables, we should 
focus our attention on how re- 
searchers choose these variables 
for inclusion in their research 
studies. An important point to 
keep in mind is that the researcher 
selects the independent and de- 
pendent variables based on the re- 
search problem and the hypothe- 
ses. In many ways, this simplifies 
the process of selecting variables 
by requiring the selection of inde- 
pendent and dependent variables 
to flow logically from the state- 
ment of the research problem and 
the hypotheses. Once the research 



= '■ flop/a 'Reference 2. ] S 

Definition of "Research" 

In Chapter I , we briefly defined 
research as an examination of the 
relationship between two or more 
variables. We can now be a little 
more specific in our definition of 
"research." Research is an examina- 
tion of the relationship between 
one or more independent vari- 
ables and one or more dependent 
variables. In even more precise 
terms, we can define research as 
an examination of the effects of 
one or more independent vari- 
ables on one or more dependent 
variables. 
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problem and the hypotheses are articulated, it should not take too much 
effort to identify the independent and dependent variables. 

Perhaps another example will clarify this important point. Suppose that 
a researcher is interested in examining the relationship between intake of 
dietary fiber and the incidence of colon cancer among elderly males. The 
research problem may be stated in the following manner: "Does increased 
consumption of dietary fiber result in a decreased incidence of colon can- 
cer among elderly males?" Using our suggested phrasing from the previ- 
ous paragraph, we could also ask the following question: "What are the 
effects of dietary fiber consumption on the incidence of colon cancer 
among elderly males?" Following logically from this research problem, the 
researcher may hypothesize the following: "High levels of dietary fiber 
consumption will decrease the incidence of colon cancer among elderly 
males." Obviously, several terms in this hypothesis need to be opera- 
tionally defined, but we can skip that step for the purposes of the current 
example. It takes only a cursory examination of the research problem and 
related hypothesis to determine the independent variable and dependent 
variable for this study. Have you figured it out yet? Because the researcher 
is interested in examining the effects of consuming dietary fiber on the in- 
cidence of colon cancer, "dietary fiber consumption" is the independent 
variable and a measure of the "incidence of colon cancer" is the depen- 
dent variable. 



Categorical Variables vs. Continuous Variables 

Now that you are familiar with the difference between independent and 
dependent variables, we will turn our attention to another category of vari- 
ables with which you should be familiar. The distinction between categor- 
ical variables and continuous variables frequently arises in the context of 
many research studies. Categorical variables are variables that can take on 
specific values only within a defined range of values. For example, "gen- 
der" is a categorical variable because you can either be male or female. 
There is no middle ground when it comes to gender; you can either be 
male or female; you must be one, and you cannot be both. "Race," "mari- 
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Putting It Into Practice 



Varying Independent Variables and Measuring 
Dependent Variables 

Assuming that a researcher has a well-articulated and specific hypothesis, 
it is a fairly straightforward task to identify the independent and depen- 
dent variables. Often, the difficult part is determining how to vary the in- 
dependent variable and measure the dependent variable. For example, 
let's say that a researcher is interested in examining the effects of viewing 
television violence on levels of prosocial behavior In this example, we can 
easily identify the independent variable as viewing television violence and 
the dependent variable as prosocial behaviorThe difficult part is finding 
ways to vary the independent variable (how can the researcher vary the 
viewing of television violence?) and measure the dependent variable (how 
can the researcher measure prosocial behavior?). Finding ways to vary the 
independent variable and measure the dependent variable often requires 
as much creativity as scientific know-how. 



tal status," and "hair color" are other common examples of categorical 
variables. Although this may sound obvious, it is often helpful to think 
of categorical variables as consisting of discrete, mutually exclusive cate- 
gories, such as "male/female," "White/Black," "single/married/di- 
vorced," and "blonde/brunette/redhead." In contrast with categorical 
variables, continuous variables are variables that can theoretically take on any 
value along a continuum. For example, "age" is a continuous variable be- 
cause, theoretically at least, someone can be any age. "Income," "weight," 
and "height" are other examples of continuous variables. As we will see, 
the type of data produced from using categorical variables differs from the 
type of data produced from using continuous variables. 

In some circumstances, researchers may decide to convert some con- 
tinuous variables into categorical variables. For example, rather than using 
"age" as a continuous variable, a researcher may decide to make it a cate- 
gorical variable by creating discrete categories of age, such as "under age 
40" or "age 40 or older." "Income," which is often treated as a continuous 
variable, may instead be treated as a categorical variable by creating dis- 
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Categorical Variables vs. Continuous Variables 

The decision of whetherto use categorical or continuous variables will 
have an effect on the precision of the data that are obtained. When com- 
pared with categorical variables, continuous variables can be measured 
with a greater degree of precision. In addition, the choice of which statisti- 
cal tests will be used to analyze the data is partially dependent on 
whether the researcher uses categorical or continuous variables. Certain 
statistical tests are appropriate for categorical variables, while other statis- 
tical tests are appropriate for continuous variables. As with many deci- 
sions in the research-planning process, the choice of which type of vari- 
able to use is partially dependent on the question that the researcher is 
attempting to answer 

crete categories of income, such as "under $25,000 per year," "$25,000- 
$50,000 per year," and "over $50,000 per year." The benefit of using con- 
tinuous variables is that they can be measured with a higher degree of pre- 
cision. For example, it is more informative to record someone's age as "47 
years old" (continuous) as opposed to "age 40 or older" (categorical) . The 
use of continuous variables gives the researcher access to more specific 
data. See Rapid Reference 2.9. 



Quantitative Variables vs. Qualitative Variables 

Finally, before moving on to a different topic, it would behoove us to 
briefly discuss the distinction between qualitative variables and quantita- 
tive variables. Qualitative variables are variables that vary in kind, while quan- 
titative variables are those that vary in amount (see Christensen, 2001). This 
is an important yet subtle distinction that frequently arises in research 
studies, so let's take a look at a few examples. 

Rating something as "attractive" or "not attractive," "helpful" or "not 
helpful," or "consistent" or "not consistent" are examples of qualitative 
variables. In these examples, the variables are considered qualitative be- 
cause they vary in kind (and not amount). For example, the thing being 
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rated is either "attractive" or "not attractive," but there is no indication of 
the level (or amount) of attractiveness. By contrast, reporting the number 
of times that something happened or the number of times that someone 
engaged in a particular behavior are examples of quantitative variables. 
These variables are considered quantitative because they provide infor- 
mation regarding the amount of something. 

As stated at the beginning of this section, there are several other cate- 
gories of variables that we will not be discussing in this text. What we have 
covered in this section are the major categories that most commonly ap- 
pear in research studies. One final comment is necessary. It is important to 
keep in mind that a single variable may fit into several of the categories that 
we have discussed. For example, the variable "height" is both continuous 
(if measured along a continuum) and quantitative (because we are getting 
information regarding the amount of height). Along similar lines, the vari- 
able "eye color" is both categorical (because there is a limited number of 
discrete categories of eye color) and qualitative (because eye color varies 
in kind, not amount). 

If this discussion of variables still seems confusing to you, take comfort 
in the fact that even seasoned researchers can still get turned around on 
these issues. As with most aspects of research, repeated exposure to (and 
experience with) these concepts tends to breed a comfortable level of fa- 
miliarity. So, the next time you come across a research study, practice iden- 
tifying the different types of variables that we have discussed in this section. 

RESEARCH PARTICIPANTS 

Selecting participants is one of the most important aspects of planning 
and designing a research study. For reasons that should become clear as 
you read this section, selecting research participants is often more difficult 
and more complicated than it may initially appear. In addition to needing 
the appropriate number of participants (which may be rather difficult in 
large-scale studies that require many participants), researchers need to 
have the appropriate kinds of participants (which may be difficult when re- 
sources are limited or the pool of potential participants is small). More- 
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over, the manner in which individuals are selected to participate, and the 
way those participants are subsequently assigned to groups within the 
study, has a dramatic effect on the types of conclusions that can be drawn 
from the research study. 

At the outset, it is important to note that not all types of research stud- 
ies involve human participants. For example, the research studies carried 
out in many fields of science, such as physics, biology, chemistry, and 
botany, generally do not involve human participants. For the research sci- 
entists in these fields, the unit of study may be an atom, a cell, a molecule, 
or a flower, but not a human participant. However, for those researchers 
who are involved in other types of research, such as social science research, 
the majority of their studies will involve human participants in some ca- 
pacity. Therefore, it is important that you become familiar with the proce- 
dures that are commonly employed by researchers to select an appropriate 
group of study participants and assign those participants to groups within 
the study. This section will address these two important tasks. 

Before proceeding any further, it is worth noting that when a researcher 
is planning a study, he or she must choose an appropriate research design 
prior to selecting study participants and assigning them to groups. In fact, 
the specific research design used in a study often determines how the par- 
ticipants will be selected for inclusion in the study and how they will be as- 
signed to groups within it. However, because the topic of choosing an ap- 
propriate research design requires an extensive and detailed discussion, we 
have set aside an entire chapter to cover that topic (see Chapter 5). There- 
fore, when reading this section, it is important to keep in mind that the 
tasks of selecting participants and assigning those participants to groups 
typically take place after you have chosen an appropriate research design. 
Accordingly, you may want to reread this section after you have read the 
chapter on research designs (Chapter 5). 

Selecting Study Participants 

For those research studies that involve human participants, the selection of 
the study participants is of the utmost importance. There are several ways 
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in which potential participants can be selected for inclusion in a research 
study, and the manner in which participants are selected is determined by 
several factors, including the research question being investigated, the re- 
search design being used, and the availability of appropriate numbers and 
types of study participants. In this section, we will discuss the most com- 
mon methods used by researchers for selecting study participants. 

For some types of research studies, specific research participants (or 
groups of research participants) may be sought out. For example, in a qual- 
itative study investigating the combat experiences of World War II veter- 
ans, the researcher may simply approach identified World War II veterans 
and ask them to participate in the study. Another example would be an in- 
vestigation of the effects of a Head Start program among preschool stu- 
dents. In this situation, the researcher may decide to study an already ex- 
isting preschool class. The researcher could randomly select preschool 
students to participate in the study, but would probably save both time and 
money by using a preexisting group of students. 

As you can probably imagine, there are some difficulties that arise when 
researchers use preexisting groups or target specific people for inclusion 
in a research study. The primary difficulty is that the study results may not 
be generalizable to other groups or other individuals (i.e., groups or indi- 
viduals not in the study). For example, if a researcher is interested in draw- 
ing broad conclusions about the effects of a Head Start program on 
preschool students in general, the researcher would not want to limit par- 
ticipation in the study to one specific group of preschool students from 
one specific preschool. For the results of the study to generalize beyond 
the sample used in the study, the sample of preschool students in the study 
would have to be representative of the entire population of preschool stu- 
dents. 

We have introduced quite a few new terms and concepts in this discus- 
sion, so we need to make sure that we are all on the same page before we 
proceed any further. Let's start with generali^ability. The concept of gener- 
alizability will be covered in detail in future chapters, so we will not spend 
too much time on it here. But we do need to take a moment and briefly dis- 
cuss what we mean when we say that the results of a study are (or are not) 
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generalizable. To make this discussion more digestible, let's look at a brief 
example. 

Suppose that a researcher is interested in examining the employment 
rate among recent college graduates. To examine this issue, the researcher 
collects employment data on 1000 recent graduates from ABC University. 
After looking at the data and conducting some simple calculations, the 
researcher determines that 97.5% of the recent ABC graduates obtained 
full-time employment within 6 months of graduation. Based on the results 
of this study, can the researcher reasonably conclude that the employment 
rate for all recent college graduates across the United States is 97.5%? Ob- 
viously not. But why? The most obvious reason is that the recent gradu- 
ates from ABC University may not be representative of recent graduates 
from other colleges. Perhaps recent ABC graduates have more success in 
obtaining employment than recent graduates from smaller, lesser-known 
colleges. As a result, there is likely a great degree of variability in the em- 
ployment rates of recent college graduates across the United States. 
Therefore, it would be misleading and inaccurate to reach a broad conclu- 
sion about the employability of all recent college graduates based exclu- 
sively on the employment experiences of recent ABC graduates. 

In the previous example, the only reasonable conclusion that the re- 
searcher can reach is that 97.5% of the recent ABC graduates in that partic- 
ular study obtained full-time employment within 6 months of graduation. 
This limited conclusion would likely be of little interest to students outside 
ABC University because the results of the study have no implications 
for those other students. For the results of this study to be generalizable 
(i.e., applicable to recent graduates from all colleges, not just ABC) the 
researcher would need to examine the employment rates for recent grad- 
uates from many different colleges. This would have the effect of ensuring 
that the sample of participants is representative of all recent college grad- 
uates. Obviously, it would be most informative and accurate if the re- 
searcher were able to examine the employment rates for all recent gradu- 
ates from all colleges. Then, rather than having to make an inference about 
the employment rate in the population based on the results of the study, 
the researcher would have an exact employment rate. 
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For obvious reasons, however, it is typically not practical to include 
every member of the population of interest (e.g., all recent college gradu- 
ates) in a research study. Time, money, and resources are three limiting 
factors that make this unlikely. Therefore, most researchers are forced 
to study a representative subset — a sample — of the population of interest. 
Accordingly, in our example, the researcher would be forced to study a 
sample of recent college graduates from the population of all recent col- 
lege graduates. (If you need a brief refresher on the distinction between a 
sample and a population, see Chapter 1 .) If the sample used in the study is 
representative of the population from which it was drawn, the researcher 
can draw conclusions about the population based on the results obtained 
with the sample. In other words, using a representative sample is what al- 
lows researchers to reach broad conclusions applicable to the entire pop- 
ulation of interest based on the results obtained in their specific studies. 
For those of you who are still confused about the concept of generaliz- 
ability, do not fret, because we revisit this issue in later chapters. 

The discussion up to this point should lead you to an obvious question. 
Specifically, if choosing a representative sample is so important for the 
purposes of generalizing the results of a study, how do researchers go 
about selecting a representative sample from the population of interest? 
The primary procedure used by researchers to choose a representative 
sample is called "random selection." Random selection is a procedure 
through which a sample of participants is chosen from the population of 
interest in such a way that each member of the population has an equal 
probability of being selected to participate in the study (Kazdin, 1992). 
Researchers using the random selection procedure first define the popu- 
lation of interest and then randomly select the required number of partic- 
ipants from the population. 

There are two important points to keep in mind regarding random 
selection. The first point is that random selection is often difficult to ac- 
complish unless the population is very narrowly defined (Kazdin, 1992). 
For example, random selection would not be possible for a population de- 
fined as "all economics students." How could we possibly define "all eco- 
nomics students"? Would this population include all economics students 
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in a particular state, or in the United States, or in the world? Would it in- 
clude both current and former economics students? Would it include both 
undergraduate and graduate economics students? Obviously, the popula- 
tion of "all economics students" is too broad, and it would therefore be 
impossible to select a random sample from that population. By contrast, 
random selection could easily be accomplished with a population defined 
as "all students currently taking introductory economics classes at a par- 
ticular university." This population is sufficiently narrowly defined, which 
would permit a researcher to use random selection to obtain a representa- 
tive sample. 

As you may have noticed, narrowly defining the population of interest, 
which we have stated is a requirement for random selection, has the nega- 
tive effect of limiting the representativeness of the resulting sample. This 
certainly presents a catch-22 — we need to narrowly define the population 
to be able to select a representative sample, but by narrowing the popula- 
tion, we are limiting the representativeness of the sample we choose. 

This brings us to the second point that you should keep in mind re- 
garding random selection, namely, that the results of a study cannot be 
generalized based solely on the random selection of participants from the 
population of interest. Rather, evidence for the generalizability of a study's 
findings typically comes from replication studies. In other words, the most 
effective way to demonstrate the generalizability of a study's findings is to 
conduct the same study with other samples to see if the same results are 
obtained. Obtaining the same results with other samples is the best evi- 
dence of generalizability. 

Despite the limitations that are associated with random selection, it is a 
popular procedure among researchers who are attempting to ensure that 
the sample of participants in a particular study is similar to the population 
from which the sample was drawn. 

Assigning Study Participants to Groups 

Once a population has been appropriately defined and a representative 
sample of participants has been randomly selected from that population, 
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the next step involves assigning those participants to groups within the re- 
search study — one of the most important aspects of conducting research. 
In fact, Kazdin (1992) regards the assignment of participants to groups 
within a research study as "the central issue in group research" (p. 85). 
Therefore, it is important that you understand how the assignment of par- 
ticipants is most effectively accomplished and how it affects the types of 
conclusions that can be drawn from the results of a research study. 

There is almost universal agreement among researchers that the most 
effective method of assigning participants to groups within a research 
study is through a procedure called "random assignment." The philosophy 
underlying random assignment is similar to the philosophy underlying 
random selection (see Rapid Reference 2.10). Random assignment involves as- 
signing participants to groups within a research study in such a way that 
each participant has an equal probability of being assigned to any of the 
groups within the study (Kazdin, 1992). Although there are several ac- 
cepted methods that can be used to effectively implement random assign- 
ment, it is typically accomplished by using a table of random numbers that 

determines the group assignment 

^fap/d Reference 2/0 for each of the P artici P ants - ( See 

■■■■ Chapter 5 for a discussion and 
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each participant has an equal ticipants are most effectively as- 

probability of being assigned to , . - , ■ _ , 

1 ' . ° signed to groups within a study 



any of the groups within the study. 



(i.e., via random assignment), we 
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should spend some time dis- 
cussing why random assignment 
is so important in the context of 
research. In short, random assign- 
ment is an effective way of ensur- 
ing that the groups within a re- 
search study are equivalent (see 
Rapid Reference 2.11). More 
specifically, random assignment is 
a dependable procedure for pro- 
ducing equivalent groups because 
it evenly distributes characteristics 
of the sample among all of the 
groups within the study (see Kaz- 
din, 1992). For example, rather 
than placing all of the participants 
over age 40 into one group, ran- 
dom assignment would, theoreti- 
cally at least, evenly distribute all 

of the participants over age 40 among all of the groups within the research 
study. This would produce equivalent groups within the study, at least with 
respect to age. 

At this point, you may be wondering why it is so important for a re- 
search study to consist of equivalent groups. The primary importance of 
having equivalent groups within a research study is to ensure that nuisance 
variables (i.e., variables that are not under the researcher's control) do not 
interfere with the interpretation of the study's results (Kazdin, 1992). In 
other words, if you find a difference between the groups on a particular de- 
pendent variable, you want to attribute that difference to the independent 
variable rather than to a baseline difference between the groups. Let's take 
a moment and explore what this means. In most studies, variables such as 
age, gender, and race are not the primary variables of interest. However, if 
these characteristics are not evenly distributed among all of the groups 
within the study, they could obscure the interpretation of the primary vari- 



= ftap/o 'Reference 2. / '/ 

Group Equivalence 

One of the most important as- 
pects of group research is isolating 
the effects of the independent 
variable. To accomplish this, the 
experimental group and control 
group should be identical, except 
forthe independent variable. The 
independent variable would be 
present in the experimental group, 
but not in the control group. As- 
suming this is the only difference 
between the two groups, any ob- 
served differences on the depen- 
dent variable can reasonably be at- 
tributed to the effects of the 
independent variable. 
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ables of interest in the study. Let's take a look at a short example that 
should help to clarify these concepts. 

A researcher interested in measuring the effects of a new memory en- 
hancement strategy conducts a study in which one group (i.e., the experi- 
mental group) is taught the memory enhancement strategy and the other 
group (i.e., the control group) is not taught the memory enhancement 
strategy. Then, all of the participants in both groups are administered a test 
of memory functioning. At the conclusion of the study, the researcher 
finds that the participants who were taught the new strategy performed 
better on the memory test than the participants who were not taught the 
new strategy. Based on these results, the researcher concludes that the 
memory enhancement strategy is effective. However, before submitting 
these impressive results for publication in a professional journal, the re- 
searcher realizes that there is a slight quirk in the composition of the two 
groups in the study. Specifically, the researcher discovers that the experi- 
mental group is composed entirely of women under the age of 30, while 
the control group is composed entirely of men over the age of 60. 

The unfortunate group composition in the previous example is quite 
problematic for the researcher, who is understandably disappointed in this 
turn of events. Without getting too complicated, here is the problem in a 
nutshell: Because the two study groups differ in several ways — exposure 
to the memory enhancement strategy, age, and gender — the researcher 
cannot be sure exactly what is responsible for the improved memory per- 
formance of the participants in the experimental group. It is possible, for 
example, that the improved memory performance of the experimental 
group is not due to the new memory enhancement strategy, but rather to 
the fact that the participants in that group are all under age 30 and, there- 
fore, are likely to have better memories than the participants who are over 
age 60. Alternatively, it is possible that the improved memory perfor- 
mance of the experimental group is somehow related to the fact that all of 
the participants in that group are women. In summary, because the mem- 
ory enhancement strategy was not experimentally isolated and controlled 
(i.e., it was not the only difference between the experimental and control 
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groups), the researcher cannot be sure whether it was responsible for the 
observed differences between the groups on the memory test. 

As stated earlier in this section, the purpose of random assignment is to 
distribute the characteristics of the sample participants evenly among 
all of the groups within the study By using random assignment, the re- 
searcher distributes nuisance variables unsystematically across all of the 
groups (see Kazdin, 1992). Had the researcher in our example used ran- 
dom assignment, the male participants over age 60 and the female partic- 
ipants under age 30 would have been evenly distributed between the ex- 
perimental group and the control group. (See Rapid Reference 2.12 for a 
discussion of testing for group equivalence.) 

If the sample size is large enough, the researcher can assume that the 
nuisance variables are evenly distributed among the groups, which in- 
creases the researcher's confidence in the equivalence of the groups 
(Kazdin, 1992). This last point should not be overlooked. Random as- 
signment is most effective with a large sample size (e.g., more than 40 par- 
ticipants per group). In other words, the likelihood of obtaining equivalent 
groups increases as the sample size increases. Once participants have been 



ftap/'d Reference 2/2 



Equivalence Testing 

Although using random assignment with large samples can be assumed to 
produce equivalent groups, it is wise to statistically examine whetherthe 
two groups are indeed equivalent.This is accomplished by comparing the 
two groups on nuisance variables to see whetherthe two groups differ 
significantly If there are no statistically significant differences between the 
two groups on any of the nuisance variables, the researcher can be confi- 
dent that the two groups are equivalent. In this situation, any observed ef- 
fects on the dependent variables can reasonably be attributed to the inde- 
pendent variable (and not to any of the nuisance variables). By contrast, if 
the two groups are not equivalent on one or more of the nuisance vari- 
ables, there are statistical steps that a researcher can take to ensure that 
the differences do not affect the interpretation of the study's results. 
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randomly assigned to groups within the study, the researcher is then ready 
to begin collecting data. (Both random selection and random assignment 
will be discussed in more detail in Chapter 3 as strategies for controlling 
artifact and bias.) 

MULTICULTURAL CONSIDERATIONS 

One final and important topic in this chapter is the relationship between 
multicultural issues and research studies. In research, as in most other ar- 
eas of life at the beginning of the 21st century, considerations surround- 
ing multiculturalism (see Rapid Reference 2.13) have taken on increased 
visibility and importance. As a result, there is a growing need for re- 
searchers at all levels and in all settings to become familiar with the role of 
multiculturalism in all aspects of research studies. 

Multicultural considerations are important in two distinct ways when it 
comes to conducting research studies. First, multicultural considerations 
often have a considerable effect on a researcher's choice of research ques- 
tion and research design (even if the researcher is unaware of the role 
played by multicultural considerations in those decisions). Second, multi- 
cultural considerations are important in the selection and composition 

of the sample of participants used 
in particular research studies. In 
^fap/O 'fe/ereflCe 2/S other words, multicultural consid- 

erations are important with re- 
Multiculturalism S p ec t to both the researcher and 

When considered in its broadest the stud y sample. This section will 



sense, a researcher who has address both of these important 

achieved multicultural competence considerations. 

is cognizant of differences among 

study participants related to race, 

ethnicity, language, sexual onenta- Multiculturalism and 

tion, gender; age, disability, class 

status, education, and religious or 



Researchers 



spiritual orientation (American . , , c , TT . , 

„ r . . . . . , _„--,, As the population of the United 

Psychological Association, 2003). r r 

States becomes increasingly di- 
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verse, there is a growing need for researchers to become more aware of the 
impact of multicultural issues on the planning and designing of research 
studies (Reid, 2002). Using the current lingo, it can be stated that there 
is a need for researchers to achieve "multicultural competence." For re- 
searchers, the first step in achieving multicultural competence is becom- 
ing aware of how their own worldviews affect their choice of research 
questions (American Psychological Association [APA], 2003). These 
worldviews necessarily include researchers' views of their own cultures as 
well as their views of other cultures. Researchers must acknowledge that 
their worldviews likely play an integral role in shaping their views of hu- 
man behavior. Hence, their theories of human behavior, as well as the re- 
search questions and hypotheses that stem from those theories, are based 
on assumptions particular to their own culture — and it is these assump- 
tions of which researchers must be aware (see Egharevba, 2001). 

To increase awareness of multicultural issues in the conceptualization 
of research designs, the researcher often benefits from consulting with 
members of diverse and traditionally underrepresented cultural groups 
(APA, 2003; Quintana, Troyano, & Taylor, 2001). This serves the purpose 
of providing perspectives and insights that may not have otherwise been 
considered by the researcher acting alone. Considering different view- 
points from members of diverse cultural groups facilitates the develop- 
ment of a culturally competent research design that has the potential to 
benefit people from many different cultures. Along similar lines, it is also 
important for researchers to recognize the limitations of their research de- 
signs in terms of applicability to diverse cultural groups. 

Researchers also need to be aware of multicultural considerations when 
deciding on assessment techniques and instruments for their studies. For 
example, when working with a culturally diverse sample, it is important 
that researchers use instruments and assessment techniques that have 
been validated with culturally diverse groups (see Council of National 
Psychological Associations for the Advancement of Ethnic Minority Inter- 
ests, 2000). According to the APA's Guidelines on Multicultural Education, 
Training, Research, Practice, and Organisational Change for Psychologists (2003, 
p. 389), "psychological researchers are urged to consider culturally sensi- 
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tive assessment techniques, data-generating procedures, and standardized 
instruments whose validity, reliability, and measurement equivalence have 
been tested across culturally diverse sample groups. . . ." 

Finally, when it comes to interpreting data and drawing conclusions, re- 
searchers need to consider the role of culture and cultural hypotheses. It 
is conceivable, for example, that there is a culturally based explanation for 
the research study's findings, and it therefore may be prudent to statisti- 
cally examine relevant cultural variables. Researchers also need to be cog- 
nizant of the cultural limitations and generalizability of the research 
study's results. 

Multiculturalism and Study Participants 

In the preceding section, we emphasized the importance of multicultural 
considerations in terms of formulating a research question, choosing an 
appropriate research design, selecting assessment strategies, and analyzing 
data and drawing conclusions. In this section, we will focus on multicul- 
tural considerations as they relate to selecting the research participants 
who make up the study sample. As you will see, the inclusion of people 
from diverse cultural backgrounds in study samples has attracted a great 
deal of attention in recent years. 

The debate regarding the appropriate composition of study samples 
is no longer exclusively in the domain of researchers. The federal govern- 
ment has voiced an opinion on this important issue. In 1993, President 
Clinton signed into law the NIH Revitalization Act of 1993 (PL 103-43), 
which directed the National Institutes of Health (NIH) to establish guide- 
lines for the inclusion of women and minorities in clinical research. On 
March 9, 1994, in response to the mandate contained in the NIH Revital- 
ization Act, the NIH issued NIH Guidelines on the Inclusion of Women and Mi- 
norities as Subjects in Clinical Research (henceforth "NIH Guidelines"). 

According to the NIH Guidelines, because research is designed to pro- 
vide scientific evidence that could lead to a change in health policy or a 
standard of care, it is imperative to determine whether the intervention be- 
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ing studied affects both genders as well as diverse racial and ethnic groups 
differendy. Therefore, all NIH-supported biomedical and behavioral re- 
search involving human participants is required to be carried out in a 
manner that elicits information about individuals of both genders and 
from diverse racial and ethnic backgrounds. According to the Office for 
Protection From Research Risks, which is part of the U.S. Department of 
Health and Human Services, the inclusion of women and minorities in re- 
search will, among other things, help to increase the generalizability of the 
study's findings and ensure that women and minorities benefit from the 
research. Although the NIH Guidelines apply only to studies conducted or 
supported by the NIH, all other researchers and research institutions are 
encouraged to include women and minorities in their research studies, as 
well. 

SUMMARY 

In this chapter, we have covered the research-related issues that are most 
commonly encountered by researchers when they are planning and de- 
signing research studies. There are certainly other topics related to plan- 
ning and designing a research study that we could have included in this dis- 
cussion (e.g., choosing study instruments), but we chose to take a broad 
approach because of the inherent uniqueness of research studies. Rather 
than discussing topics that are specific to specific types of studies, we be- 
lieved that it would be most beneficial to make the discussion more gen- 
eral by focusing on the research-related topics that are encountered by vir- 
tually all researchers when planning and designing studies. 
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JSfr TEST YOURSELF ^u 



1 . Researchers become familiar with the existing literature on a particular 
topic by conducting a . 

2. Researchers use to attempt to explain, predict, and explore the 

phenomenon of interest. 

3. The hypothesis always predicts that there will be no differ- 
ences between the groups being studied. 

4. The is a measure of the effect (if any) of the inde- 
pendent variable. 

5. The most effective method of assigning participants to groups within a re- 
search study is through a procedure called . 

Answers: I . literature review; 2. hypotheses; 3. null; 4. dependent variable; 5. random assign- 
ment 
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Three 

GENERAL APPROACHES FOR 
CONTROLLING ARTIFACT AND BIAS 



In Chapter 6, we will discuss the four main types of experimental valid- 
ity and the potential threats associated with each. These threats are also 
referred to as confounds, or sources of artifact and bias. Remember that 
we conduct research to systematically study specified variables of interest. 
Any variable that is not of interest, but that might influence the results, can 
be referred to as a potential confound, artifact, or source of bias. The pri- 
mary purpose of research design is to eliminate these sources of bias so 
that more confidence can be placed in the results of the study. Identifying 
potential sources of artifact and bias is therefore an essential first step in 
ensuring the integrity of any conclusions drawn from the data obtained 
during a study. Once the threats are identified, appropriate steps can be 
taken to reduce their impact. 

Unfortunately, even the most seasoned researchers cannot account for 
or foresee every potential source of artifact and bias that might confound 
the results or be present in a research design. In this chapter, we will dis- 
cuss general strategies and controls that can be used to reduce the impact 
of artifact and bias. These strategies are very useful in that they help reduce 
the impact of artifact and bias even when the researcher is not aware that 
they exist in the study. These strategies should be considered early in the 
design phase of a research study. Early consideration allows the researcher 
to take a proactive, preventive approach to potential artifacts and biases 
and minimizes the need to be reactionary as problems arise later in the 
study. Early consideration cannot be overemphasized because the worth 
of the findings of any research study is directly related to the reduction or 
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elimination of confounding sources of artifact and bias. Implementing 
these basic strategies also reduces threats to validity and bolsters the con- 
fidence we can place in the findings of a study. 

A BRIEF INTRODUCTION TO VALIDITY 

Our introduction to this chapter suggests that the purpose of research is 
to provide valid conclusions regarding a wide range of researchable phe- 
nomena. Although we discuss it in detail in Chapter 6, a brief discussion of 
the concept of validity is necessary here to frame our general discussion of 
the experimental control of artifact and bias. Validity refers to the concep- 
tual and scientific soundness of a research study or investigation, and the 
primary purpose of all forms of research is to produce valid conclusions. 
Researchers are usually interested in studying the relationship of spe- 
cific variables at the expense of other, perhaps irrelevant, variables. To 
produce valid, or meaningful and accurate, conclusions researchers must 
strive to eliminate or minimize the effects of extraneous influences, vari- 
ables, and explanations that might detract from the accuracy of a study's 
ultimate findings. Put simply, validity is related to research methodology 
because its primary purpose is to increase the accuracy and usefulness of 
findings by eliminating or controlling as many confounding variables as 
possible, which allows for greater confidence in the findings of any given 
study. Chapter 6 further discusses the main types of validity and the spe- 
cific threats related to each, so we will not go into any more detail about 
the subject in this chapter. The remaining material in this chapter will dis- 
cuss general design strategies that can be used to help ensure that the con- 
clusions drawn from the results of a study are valid. 

SOURCES OF ARTIFACT AND BIAS 

In Chapter 6, we discuss the most common threats to validity. The mater- 
ial in Chapter 6 is very specific to the four main types of validity encoun- 
tered in research design and methodology — internal, external, construct, 
and statistical conclusion validity (see Rapid Reference 3.1). By contrast, 
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Four Types of Validity 

Internal validity refers to the ability of a research design to rule out 
or make implausible alternative explanations of the results, or plausible 
rival hypotheses. (A plausible rival hypothesis is an alternative interpreta- 
tion of the researcher's hypothesis about the interaction of the depen- 
dent and independent variables that provides a reasonable explanation 
of the findings other than the researcher's original hypothesis.) 

External validity refers to the generalizability of the results of a re- 
search study. In all forms of research design, the results and conclusions 
of the study are limited to the participants and conditions as defined by 
the contours of the research. External validity refers to the degree to 
which research results generalize to other conditions, participants, 
times, and places. 

Construct validity refers to the basis of the causal relationship and is 
concerned with the congruence between the study's results and the 
theoretical underpinnings guiding the research. In essence, construct va- 
lidity asks the question of whether the theory supported by the find- 
ings provides the best available explanation of the results. 

Statistical validity refers to aspects of quantitative evaluation that 
affect the accuracy of the conclusions drawn from the results of a study. 
At its simplest level, statistical validity addresses the question of 
whether the statistical conclusions drawn from the results of a study 
are reasonable. 



the aim of this chapter is more general. While Chapter 6 discusses specific 
artifacts, biases, and confounds as they relate to the four main types of va- 
lidity, this chapter provides valuable information on general sources of ar- 
tifact and bias that can exist in most forms of research design. It also pro- 
vides a framework for minimizing or eliminating a wide variety of these 
confounds without directly addressing specific threats to validity. 

Although sources of artifact and bias can be classified across a number 
of broad categories, these categories are far from all-inclusive or exhaus- 
tive. The reason for this is that every research study is distinct and is faced 
with its own unique sources of artifact and bias that may threaten the va- 
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lidity of its findings. In addition, 
"= rxffp/O nC/CfCflCC J.Z sources of artifact and bias can oc- 

cur in isolation or in combination, 
Methods for Controlling f urt her compounding the poten- 

Sources of Artifact tial threats to validity. Researchers 

must be aware of these potential 

• Statistical controls threats and control for them ac- 

• Control and comparison groups cordingly. Failure to implement 

• Random selection appropriate controls at the outset 

• Random assignment of a study may substantially re- 

• Experimental design duce the researcher's ability to 

draw confident inferences of 
causality from the study findings. Fortunately, there are several ways that 
the researcher can control for the effects of artifact and bias. The most ef- 
fective methods include the use of statistical controls, control and com- 
parison groups, and randomization (a more complete list is found in Rapid 
Reference 3.2). 

A short discussion of sources of artifact and bias is necessary before we 
can address methods for minimizing or eliminating their impact on the 
validity of study findings. As mentioned, the types of potential sources of 
artifact and bias are virtually endless — for example, the heterogeneity of 
research participants alone can contribute innumerable sources. Research 
participants bring a wide variety of physical, psychological, and emotional 
traits into the research context. These different characteristics can directly 
affect the results of a study. Similarly, an almost endless array of environ- 
mental factors can influence a study's results. For example, consider what 
your level of attention and or motivation might be like in an excessively 
warm classroom versus one that is comfortable and conducive to learning. 
As you will see in Chapter 4, measurement issues can also introduce arti- 
fact and bias into the study. The use of poorly validated or unreliable mea- 
surement strategies can contribute to misleading results (Leary, 2004; 
Rosenthal & Rosnow, 1969). To make matters worse, sources of artifact 
and bias can also combine and interact (e.g., as when one is taking a poorly 
validated test in an uncomfortable classroom) to further reduce the valid- 
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ity of study findings. Despite the potentially infinite types and combina- 
tions of artifact and bias, they can generally be seen as falling into one of 
several primary categories. 



Experimenter Bias 

Ironically, the researchers themselves are the first common source of arti- 
fact and bias (Kintz, Delprato, Mettee, Persons, & Shappe, 1965). Fre- 
quently called experimenter bias this source of artifact and bias refers to the 
potential for researchers themselves to inadvertently influence the behav- 
ior of research participants in a certain direction (Adair, 1973; Beins, 
2004). In other words, a researcher 



who holds certain beliefs about 
the nature of his or her research 
and how the results will or should 
turn out may intentionally or un- 
intentionally influence the out- 
come of the study in a way that fa- 
vors his or her expected outcome 
(Barber & Silver, 1968); the 
Rosenthal and Pygmalion effects 
(see Rapid Reference 3.3) are ex- 
amples. 

Experimenter bias can mani- 
fest itself across a wide variety of 
circumstances and settings. For 
example, a researcher might inter- 
pret data in such a way that it sup- 
ports his or her theoretical orien- 
tation or a particular theoretical 
paradigm. Similarly, the re- 
searcher might be tempted to 
change the original research hy- 
potheses to fit the actual data 



= '-flop/a 'Reference JJ 

The Rosenthal and 
Pygmalion Effects 

The Rosenthal and Pygmalion ef- 
fects are examples of experi- 
menter bias. Both of these terms 
refer to the documented phenom- 
enon that researchers' expecta- 
tions (rather than the experimen- 
tal manipulation) can bias the 
outcome of study by influencing 
the behavior of their participants. 



DOK'T FORGET 

Experimenter Bias 



Experimenter bias exists when re- 
searchers inadvertently influence 
the behavior of research partici- 
pants in a way that favors the out- 
comes they anticipate. 
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when it becomes apparent that the data do not support the original hy- 
potheses. A related bias occurs when researchers blatantly ignore findings 
that do not support their hypotheses. Other, more innocuous examples 
include subtle errors in data collection and recording and unintentional 
deviations from standardized procedures. These biases are particularly 
prevalent in studies in which a single researcher is responsible for gener- 
ating the hypotheses, designing the study, and collecting and analyzing the 
data (Barber, 1976). Let's now consider how experimenter bias might 
specifically manifest itself in the context of research methodology. 

Consider an example in which a researcher is studying the efficacy of 
different types of psychotherapy. The study is comparing three different 
types of therapy, and our researcher has a personal belief that one of the 
three is superior to the other two treatments. Our researcher is involved in 
conducting screening assessments of symptom levels, and based on those 
results, assigns participants to the different treatment conditions. The re- 
searcher's personal interest in one particular form of therapy might lead to 
the introduction of a potential source of artifact or bias. For example, if the 
researcher thinks that his or her therapeutic preference is superior, then 
individuals with greater symptom levels might be unconsciously (or inad- 
vertently) assigned to that treatment group. Here, the underlying bias 
might be that a superior form of treatment is necessary to help the partic- 
ipants in question. This could work in the other direction as well, when the 
researcher unconsciously (or inadvertently) assigns participants with low 
symptom levels to the treatment of choice. Either approach can bias the 
results and blur the findings as they relate to the relationship between the 
intervention and symptom level, or independent and dependent variables. 

A subtler example could simply be the fact that the researcher uncon- 
sciously treats some participants differently from others during the ad- 
ministration of the screening or other aspects of the treatment interven- 
tions. Perhaps the researcher is having a particularly bad or stressful day 
and is not as engaging or amiable as he or she might otherwise be while in- 
teracting with the participants. Participants might feel somewhat different 
after interacting with the researcher and this might have an impact on their 
self-report of symptoms or their attitudes toward engaging in the study. 
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Another example of experimenter bias is related to training and so- 
phistication. Like people in general, researchers possess varying levels of 
knowledge and sophistication, which can have a significant impact on any 
study Consider our previous therapy example. Let's assume that three 
different researchers are conducting the therapeutic interventions. One 
researcher has 20 years of experience, the other has 10, and the final one 
is just out of graduate school and has little practical experience. Any re- 
sults that we might obtain from this study might be a reflection more of 
therapist experience than of the nature and effectiveness of the three dif- 
ferent types of therapy. Although subtle, experimenter biases can have a 
significant impact on the validity of the research findings because they 
can blur the relationship between the independent and dependent vari- 
ables. 

Controlling Experimenter Bias 

As just mentioned, experimenter bias can have substantial negative im- 
pacts on the overall validity of a study. Fortunately, there are a number of 
strategies (listed in Rapid Reference 3.4) that can be employed to minimize 
the impact of these biases. 

The first strategy is to maintain careful control over the research pro- 
cedures. The goal of this approach is to hold study procedures constant, 
in an attempt to minimize unforeseen variance in the research design. In 
other words, all procedures should be carefully standardized. This might 
include the use of manualized study procedures, standardized instru- 
ments, and uniform scripts for interacting with research participants. 
Some studies go so far as to try to anticipate participant questions and be- 
haviors and script out appropriate responses for researchers to follow. 

Typically, this type of control is limited to the recruitment and assess- 
ment of participants and to the giving of standardized instructions 
throughout the study. Inclusion criteria and standards are usually devel- 
oped to ensure that only appropriate participants are included in the study. 
Achieving this type of control is more difficult than it might sound. Re- 
member that research participants bring a wide range of individual differ- 
ences to any research study. Despite this, there are other steps related to 
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Strategies for Minimizing Experimenter Effects 

• Carefully control or standardize all experimental procedures. 

• Provide training and education on the impact and control of experi- 
menter effects to all of the researchers involved in the study. 

• Minimize dual or multiple roles within the study. 

• When multiple researcher roles are necessary, provide appropriate 
checks and balances and quality control procedures, whenever pos- 
sible. 

• Automate procedures, whenever possible. 

• Conduct data collection audits and ensure accuracy of data entry. 

• Consider using a statistical consultant to ensure impartiality of results 
and choice of appropriate statistical analyses. 

• Limit the knowledge that the researcher or researchers have regarding 
the nature of the hypotheses being tested, the experimental manipula- 
tion, and which participants are either receiving or not receiving the 
experimental manipulation. 

constancy that researchers can employ to minimize the impact of experi- 
menter bias. 

One of the more common approaches to achieving constancy is to pro- 
vide training and education on the impact and control of experimenter ef- 
fects to all of the researchers involved in the study. Although it has been 
said that ignorance is bliss, this is usually not the case in research design. 
Ignorance of the potential impact of researcher behavior and attitudes on 
the results of a study is a common source of bias that can be easily ad- 
dressed through education and training. Awareness of the potential impact 
of behavior is usually the first step in making sure that the behavior does 
not go unregulated or unchecked in a research context. Training and edu- 
cation are essential when there are varying levels of expertise among re- 
searchers or when the researchers have enlisted the help of support staff 
who possess little experience in conducting research. At a minimum, train- 
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ing in this area should include a discussion of the most common types of 
experimenter effects and how they are best minimized or eliminated. 

As noted previously, there are numerous types of experimenter effects 
that can bias the results of a study Some can be minimized through aware- 
ness and training, and others through standardized procedures. We also 
mentioned that experimenter effects might be more prevalent when one 
individual is acting in multiple roles within the study. This is particularly 
true in smaller studies for which funding and resources are limited, such as 
graduate school dissertation research. 

The problem that this might produce in light of experimenter effects is 
an apparent one: temptation. The solution is relatively simple — use mul- 
tiple researchers and provide appropriate checks and balances and quality 
control procedures whenever possible. It might also be helpful to divide 
responsibilities in a way that minimizes possible confounds and tempta- 
tions to act in a way that might be inconsistent with drawing valid conclu- 
sions from the results of the study. Let's consider some examples. 

Checks and balances, or quality control procedures, are essential for 
eliminating potential experimenter biases. As discussed previously, stan- 
dardized procedures are the first step in ensuring the strength of a research 
design. Participant inclusion criteria, scripts, standardized interventions, 
and control of the experimental environment are all examples of stan- 
dardizing various aspects of a research design. There are other steps re- 
lated to standardization that can be taken to further bolster validity and 
minimize potential experimenter effects. Unfortunately, many of these ap- 
proaches are labor intensive and require multiple researchers. When the 
inclusion of multiple researchers is not possible, informal consultation 
with knowledgeable colleagues should be utilized whenever possible. 

Most studies begin with developing the research question, construction 
of the research design, and generation of hypotheses. Having multiple re- 
searchers involved in planning a research study brings a diversity of views 
and opinions that should minimize the likelihood of a poorly conceptual- 
ized research design. With an effective and appropriate design in place, 
multiple researchers can also be used to ensure that other aspects of the 
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study are executed in a way that helps minimize or eliminate experimenter 
bias. For example, multiple researchers could develop participant inclu- 
sion and exclusion criteria. Similarly, participant inclusion might be de- 
pendent on agreement by two or more researchers as to whether the par- 
ticipant meets the required criteria. 

Multiple researchers can also act as a quality control mechanism for the 
actual delivery of the intervention, or independent variable. Again, more 
than one researcher might be involved in designing the intervention re- 
lated to the independent variable, and then in confirming that the inter- 
vention was actually delivered to the participants in the required fashion. 

Data collection and analysis is another area where multiple researchers 
can be an asset to minimizing or eliminating experimenter bias. Audits can 
be conducted to determine whether mistakes were made in the data col- 
lection or data entry processes. Similarly, multiple researchers can help en- 
sure that the correct statistical analyses are conducted and that the results 
are reported in an accurate manner (O'Leary, Kent, & Kanowitz, 1975). A 
statistical expert should be consulted whenever there is uncertainty about 
which statistical approaches might best be used to answer the research 
question. Finally, this approach can be useful in the communication of the 
results of the study because multiple authors bring a more diverse view to 
the conceptualization, interpretation, and application of the findings. 

There are other methodological approaches that allow us to further 
minimize the impact of experimenter bias. Recall from previous para- 
graphs that knowledge about the research hypotheses and the nature of 
the experimental manipulation has the potential to inappropriately influ- 
ence or bias the outcome of a study. It makes intuitive sense that limiting 
this knowledge (if permitted by the specific research design) might have a 
positive impact on the validity of the conclusions drawn from the study 
because it might help to further minimize the potential impact of experi- 
menter effects. 

There are three main approaches or procedures for limiting the knowl- 
edge that researchers have regarding the nature of the hypotheses being 
tested, of the experimental manipulation, and of which participants are 
either receiving or not receiving the experimental manipulation (Chris- 
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tensen, 2004; Graziano & Raulin, 2004). Each of these procedures seeks 
to reduce or minimize the researcher's knowledge about the participants 
and about which experimental conditions they are assigned to (Graziano 
& Raulin). 

The first approach is referred to as the double-blind technique, which is the 
most powerful method for controlling experimenter expectancy and re- 
lated bias. This procedure requires that neither the participants nor the re- 
searchers know which experimental or control condition the participants 
are assigned to (Leary, 2004). This often requires that the study be super- 
vised by a person who tracks assignment of participants without inform- 
ing the main researchers of their status (Rosenthal, Persinger, Vikan- 
Kline, & Mulry, 1963). Without this knowledge, it will be very difficult for 
the other researchers to either intentionally or inadvertently introduce 
experimenter bias into the study. 

For a variety of reasons, it is often not practical or appropriate to use a 
double-blind procedure. This leads us to a discussion of the second most 
effective approach for controlling experimenter bias: the blind technique. 
The blind technique requires that the researcher be kept "blind" or naive 
regarding which treatment or control conditions the participants are in 
(Christensen, 1988). As with the double-blind technique, someone other 
than the researcher assigns the participants to the required control or 
experimental conditions without revealing the information to the re- 
searcher. 

If either the double-blind or blind technique is inappropriate or im- 
practical, the researcher can resort to a third approach to minimizing ex- 
perimenter bias. The final method for accomplishing this is known as the 
partial-blind technique, which is similar to the blind technique except that the 
researcher is kept naive regarding participant selection for only a portion 
of the study. Most commonly, the researcher is kept naive throughout par- 
ticipant selection and assignment to either control or experimental condi- 
tions (Christensen, 1988). 

These three approaches — double-blind, blind, and partial-blind — are 
summarized in Rapid Reference 3.5. We will return to the topic of experi- 
menter bias in Chapter 5. 
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Approaches for Limiting Researchers' Knowledge of 
Participant Assignment 

Double-blind technique:The most powerful method for controlling 
researcher expectancy and related bias, this procedure requires that 
neitherthe participants northe researchers know which experimental 
or control condition research participants are assigned to. 

Blind technique:This procedure requires that only the researcher be 
kept "blind" or naive regarding which treatment or control conditions 
the participants are in. 

Partial-blind technique:This procedure is similar to the blind tech- 
nique, except that the researcher is kept naive regarding participant se- 
lection for only a portion of the study. 



Participant Effects 

As just discussed, experimenter effects are a potential source of bias in any 
research study. If the researchers can be a significant source of artifact and 
bias, then it makes both intuitive and practical sense that the participants 
involved in a research project can also be a significant source of artifact 
and bias. Accordingly, we will now discuss a second common form of ar- 
tifact and bias that can introduce significant confounds into a research de- 
sign if not properly controlled. 
This source of artifact and bias is 



DON'T FORGrET most commonly referred to as 

"participant effects." 

Participant Effects As the name implies, the partic- 

n _. ._ ^ a , r ipants involved in a research study 

Participant effects are a source of r ' 

artifact and bias stemming from a can be a significant source of arti- 

variety of factors related to the fact and bias. Just like researchers, 

unique motives, attitudes, and be- they bring thdr own unique sets 

haviors that participants bring to 



any research study. 



of biases and perceptions into the 
research setting. Put simply, partic- 
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Participant Effects by Any 
Other Name . . . 

Participant effects are also re- 
ferred to as "demand characteris- 
tics." Demand characteristics are 
the tendencies of research partici- 
pants to act differently than they 
normally might simply because 
they are taking part in a study. At 
their most severe, demand charac- 
teristics are changes in behavior 
that are based on assumptions 
about the underlying purpose of 
the study, which can introduce a 
significant confound into the 
study's findings. 



ipant effects refers to a variety of 
factors related to the unique mo- 
tives, attitudes, and behaviors that 
participants bring to any research 
study (Kruglanski, 1975; Orne, 
1962). For example, is the partici- 
pant anxious about the process, 
eager to please the researcher, or 
motivated by the fact that he or 
she is being compensated for par- 
ticipation? Do the participants 
think they have figured out the 
purpose of the study, and are they 
acting accordingly? In other 
words, are the participants, either 
consciously or unconsciously, al- 
tering their behavior to the de- 
mands of the research setting? 
(See Rapid Reference 3.6). 

In this regard, participant effects are very similar to experimenter ef- 
fects because they are simply the expression of individual differences, pre- 
dispositions, and biases imposed upon the context of a research design. 
Often, participants are unaware of their own attitudes, predispositions, 
and biases in their day-to-day lives, let alone in the carefully controlled 
context of a research study. 

The impact of participant effects has been thoroughly researched and 
well documented. At the broadest level of conceptualization, research 
suggests that the level of participant motivation and behavior changes 
simply as a result of the person's being involved in a research study. This 
phenomenon is most commonly referred to as the Hawthorne effect. The 
term "Hawthorne effect" was coined as a result of a series of studies that 
lent support to the proposition that participants often change their be- 
havior merely as a response to being observed and to be helpful to the re- 
searcher. There are numerous, more specific ways that participant effects 
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could manifest themselves in the context of a research design. Many of 
these manifestations are directly related to the different roles that a par- 
ticipant might assume within the context of the research study. 

Consider for a moment that most participants in research studies are 
volunteers (Rosen, 1970; Rosnow, Rosenthal, McConochie, & Arms, 
1969). As such, these individuals might be different from other people 
who decide not to participate or do not have the opportunity to partici- 
pate in the study. This is further confounded by the fact that a significant 
amount of research is conducted on college undergraduates enrolled in 
introductory-level psychology courses. Often, participation in research is 
tied to course credit or some other form of external motivation or reward. 
Accordingly, volunteer participants might be different from the general 
population as a whole, and the conclusions drawn from the study might be 
limited to this specific population. Therefore, even volunteer status may 
result in a participant effect because volunteers are a unique subset of the 
population with distinct characteristics that can have a significant impact 
on the results of the study. 

Some commentators have taken the concept of participant effects to an 
even more refined level by identifying the different "roles" that a partici- 
pant might consciously or unconsciously adopt in the context of a re- 
search study (Rosnow, 1970; Sigall, Aronson, & Van Hoose, 1970; Spin- 
ner, Adair, & Barnes, 1977). Although there is some disagreement about 
the existence and exact classification of participant roles, the most com- 
monly discussed roles include the "good," the "negativistic," the "faith- 
ful," and the "apprehensive" participant roles (Kazdin, 2003c; Weber & 
Cook, 1972). 

The "good" participant might attempt to provide information and re- 
sponses that might be helpful to the study, while the "negativistic" partic- 
ipant might try to provide information that might confound or undermine 
it. The "faithful" participant might try to act without bias, while the "ap- 
prehensive" participant might try to distort his or her responses in a way 
that portrays him or her in an overly positive or favorable light (Kazdin, 
2003c). Regardless of the role or origin, participant effects, either alone or 
in combination, can have a direct impact on the attitudes of research par- 
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ticipants, which in turn can have an impact on the overall validity of the 
study. Specifically, participant effects can undermine both the internal and 
external validity of a study. Internal and external validity are discussed in 
detail in Chapter 6. 

Controlling Participant Effects 

As with experimenter effects, researchers should consider and attempt to 
control for the impact of participant effects. And, as with the sources of 
bias, the potential impact of these effects should be considered early on 
during the design phase of the study. Conveniently, one of the methods for 
controlling participant effects is exactly the same as one for controlling 
experimenter effects, namely, the use of the double-blind technique. Re- 
member that this procedure requires that neither the participants nor the 
researchers know which experimental or control conditions the partici- 
pants are assigned to. Without this knowledge, it would be difficult for par- 
ticipants to alter their behavior in ways that would be related to the exper- 
imental conditions to which they were assigned. This approach, however, 
would still not prevent a participant from adopting one of the precon- 
ceived participant roles we discussed previously. 

Deception is another relatively common method for controlling partici- 
pant effects. The use of deception should not be taken lightly because 
there are potential ethical issues that should be considered before pro- 
ceeding. At a minimum, deception cannot jeopardize the well-being of the 
study participants, and at the conclusion of the study, researchers are usu- 
ally required to explain to the participants why deception was used. When 
researchers use deception, it usually takes the form of providing partici- 
pants with misinformation about the true hypotheses of interest or the 
focus of the study (see Christensen, 2004) . Without knowledge of the true 
hypotheses, it is much more difficult for participants to alter their behav- 
iors in ways that either support or refute the research hypotheses. 

Double-blind and deception techniques are common ways of control- 
ling for participant effects, and these approaches operate by altering the 
knowledge available to the participants. One drawback to these approaches 
is that the researchers will never know for certain whether their attempts at 
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Use Deception Cautiously and Only Under 
Appropriate Circumstances! 

The use of deception in research design is controversial and should not 
be undertaken without serious consideration of the possible implications 
and consequences. Certain ethical codes and federal rules and regulations 
are very clear that the potential gains of using deception in research must 
be balanced against potential negative consequences and effects on the 
participants. Generally, the use of deception must be justified in the con- 
text of the research study's possible scientific, educational, or applied 
value. In addition, the researchers must consider other approaches and 
demonstrate that the research question necessarily involves the use of 
deception. Researchers must never use deception when providing infor- 
mation about the possible risks and benefits of participating in the study 
or in obtaining the informed consent of the research participants. 



control were successful or what the participants were actually thinking as 
they progressed through the various aspects of the research study Fortu- 
nately, there is one more approach for controlling for participant effects 
that allows the researchers to gather information about participant atti- 
tudes and behavior as they progress through the research study. 

This third approach is straightforward and focuses on a process of in- 
quiry. The researchers can simply ask the participants about any number of 
issues related to participant effects and the overall purpose and hypothe- 
ses of the study. Typically, the researchers will ask questions related to the 
hypotheses and the natures of the roles adopted by the participants. The 
timing of the questioning can vary For example, participants might be 
asked about specific or essential aspects of the study in a retrospective 
fashion, after they have completed the study. On the other hand, the re- 
searchers might decide to question participants concurrently, throughout 
the course of the study. The choice of approach is up to the researchers. 
Regardless of timing, the intent of this approach is to allow the researchers 
to gather information directly from the participants regarding role, moti- 
vation, and behavior (Christensen, 2004). This information can then be 
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controlled for in the statistical analysis or used to remove a certain partic- 
ipant's data from the analysis. 

ACHIEVING CONTROL THROUGH RANDOMIZATION: 
RANDOM SELECTION AND RANDOM ASSIGNMENT 

Our discussion so far has focused on approaches for controlling two com- 
mon sources of potential artifact and bias, namely, experimenter and par- 
ticipant effects. Although important, these two types of artifact and bias 
represent only a very limited number of potential sources of artifact and 
bias that should be controlled for in a research study. Other types of arti- 
fact and bias can come from a variety of sources and are unique to the re- 
search design in question. We discuss these other types of artifacts and bi- 
ases in detail in Chapter 6. 

Controlling and minimizing these sources of artifact and bias is directly 
related to the quality of any study and it bolsters the confidence we can 
have in the accuracy and relevance of the results. In an ideal world, re- 
searchers would be able to eliminate all extraneous influences from the 
contexts of their studies. That is the ultimate goal, but one that no research 
study will likely ever obtain. As you can imagine, eliminating all sources of 
artifact and bias is virtually impossible. Fortunately, there are other meth- 
ods that can be used to help researchers control for the influence of ex- 
traneous variables that do not require the a priori identification and elim- 
ination of all potential sources of artifact and bias. The most powerful and 
effective method for minimizing the impact of extraneous variables and 
ensuring the internal and external validity of a research study is random- 
ization. 

Randomisation is a control method that helps to ensure that extraneous 
sources of artifact and bias will not confound the validity of the results of 
the study. In other words, randomization helps ensure the internal validity 
of the study by helping to eliminate alternative rival hypotheses that might 
explain the results of the study. (We will discuss internal validity in detail in 
Chapter 6.) Unlike other forms of experimental control, randomization 
does not attempt to eliminate sources of artifact and bias from the study. 
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Instead, randomization attempts 

"DOW ' f "P O "R (?• "F T 

ukju x ivnijx/x to contro i f of tne effects of extra- 

neous variables by ensuring that 

Randomization .•• , . n f .i 

they are equivalent across all or the 

Randomization is a control method experimental and control groups 

that helps to eliminate alternative in the smdy _ Randomization can 

rival hypotheses that might other- , , , , . . 

. . xl .. ° r .. be used when selecting the partici- 

wise explain the results of the ° r 

study. Randomization does not at- P ants for the stud y and for ^^ 

tempt to eliminate sources of arti- ing those participants to various 

fact and bias from the study. In- conditions within the study These 

stead, it attempts to control for , r . 

.. rr , _, . . i two approaches are referred to as 

the effects of extraneous variables rr 

by ensuring that they are equiva- "random selection" and "random 

lent across all of the experimental assignment," respectively. As you 

and control groups in the study. may recall, the topic of randomiza- 

tion was briefly discussed in Chap- 
ter 2 in the context of choosing study participants and assigning those 
participants to groups within the study In this section, we will discuss 
randomization as a strategy for controlling artifact and bias. 

We will now discuss how participant selection and assignment consti- 
tute the most effective way of controlling for and minimizing the impact 
of sources of artifact and bias. As mentioned previously, it is impossible to 
identify, let alone eliminate, all of the potential confounds that can be at 
work within a research study. Despite this, researchers can still attempt to 
minimize the effects of these confounds by using random selection and 
random assignment in participant selection and assignment procedures. 
Random selection is a control technique that increases external validity, 
and it refers to the process of selecting participants at random from a de- 
fined population of interest (Christensen, 2004; Cochran, 1977). We will 
discuss external validity in detail in Chapter 6. The population of interest is 
usually defined by the purpose of the research and the research question 
itself. For example, if the purpose of a research project is to study depres- 
sion in the elderly, then the population of interest will most likely be elderly 
people with depression. 
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The research question might further define the population of interest; 
in this example, the research question might be the following: Does a new 
therapy technique alleviate symptoms of depression in people over the age 
of 65? In the broadest sense, the population of interest is therefore people 
with depression who are at least 65 years old. Ideally, we would be able to 
draw our sample of participants from the entire population of elderly in- 
dividuals suffering from depression, and each of these individuals would 
have an equal chance of being selected to participate in the study The fact 
that each participant has an equal chance of being selected to participate 
is the hallmark of random selection. 

Random selection helps control for extraneous influences because it 
minimizes the impact of selection biases and increases the external valid- 
ity of the study. In other words, using random selection would help en- 
sure that the sample was representative of the population as a whole. In 
this case, a sample composed of randomly selected elderly individuals 
with depression should be representative of the population of all elderly 
individuals with depression. Theoretically, the results we obtain from a 
randomly selected sample should be generalizable to all elderly individu- 
als with depression. Figure 3.1 provides a graphic representation of this 
example. 

As you might suspect, random selection in its most general form is al- 
most impossible to accomplish. Consider the resources and logistical net- 
work that would be necessary to randomly select from an entire popula- 
tion of interest. Would you want the task of randomly selecting and 
recruiting elderly, depressed individuals from across the world? From the 
United States? From the state or city in which you live? Although possible, 
random selection is a daunting prospect even when we narrow the popu- 
lation of interest. 

For this reason, researchers tend to randomly select from samples of 
convenience. A sample of convenience is simply a potential source of partici- 
pants that is easily accessible to the researcher. A common example of a 
sample of convenience is undergraduate psychology majors, who are usu- 
ally subtly or not so subtly coerced to participate in a wide variety of re- 
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Population of all 

individuals aged 65 or 

older suffering from 

depression. 




Random selection: 

Each individual has 

equal chance of 

being chosen. 



Figure 3. 1 A graphic example of random selection. 

In any research study, the population of interest is usually defined by the purpose of the research and 
the research question itself. In our current example, the purpose of the research study is to examine 
depression in the elderly, and the research question is whether a new therapy technique alleviates 
symptoms of depression in people over the age of 65. 



search activities. We could conduct our study of depression and the elderly 
using a readily accessible sample of convenience, rather than attempting to 
sample the entire population of depressed elderly individuals. 

For example, we might approach two or three local geriatric facilities 
and try to randomly select participants from each. In many instances, the 
study might simply focus on randomly selecting participants from one 

facility. The advantage of this ap- 



DON'T FORGET 

Sample of Convenience 

A sample of convenience is simply a 
potential source of research par- 
ticipants that is easily accessible to 
the researcher 



proach is that we might actually be 
able to conduct the research and 
gain valuable, albeit limited, infor- 
mation on treating depression in 
the elderly. The primary disadvan- 
tage is that this approach has a 
negative impact on external valid- 
ity. The sample will be smaller and 
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likely less representative of the population of depressed, elderly individu- 
als, which can have a negative impact on statistical conclusion validity. 

As will be discussed in Chapter 6, the aspect of quantitative evaluation 
that affects the accuracy of the conclusions drawn from the results of a 
study is called statistical conclusion validity. At its simplest level, statistical 
conclusion validity addresses the question of whether the statistical con- 
clusions drawn from the results of a study are reasonable. Although an ex- 
haustive discussion is inappropriate at this point, the results of certain sta- 
tistical analyses can be influenced by sample size. Accordingly, the use of 
an exceptionally small, or large, sample can produce misleading results 
that do not necessarily accurately represent the actual relationship be- 
tween the independent and dependent variables. 

The second type of randomization control technique is random assign- 
ment, which is concerned with how participants are assigned to experi- 
mental and control conditions within the research study. The basic tenet 
of random assignment is that all participants have an equal likelihood of 
being assigned to any of the experimental or control groups (Sudman, 
1976). The basic purpose of random assignment is to obtain equivalence 
among groups across all potential confounding variables that might im- 
pact the study. Remember that we can never eliminate all forms of artifact 
and bias, and random assignment does not attempt to do this. Instead, it 
seeks to distribute or equalize these potential confounds across experi- 

DON'T FORGET 

Random Assignment 

Random assignment is a control technique in which all participants have an 
equal likelihood of being assigned to any of the experimental or control 
groups. Random assignment increases internal validity because it distrib- 
utes or equalizes potential confounds across experimental and control 
groups. Studies that use random assignment are referred to as true experi- 
ments, while studies that do not use random assignment are referred to 
as quasi experiments. See Chapter 5 for a more detailed discussion of true 
experimental and quasi-experimental research designs. 
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mental and control groups. Let's consider our study of depression and the 
elderly to illustrate the concept of random assignment. 

We manage to randomly select 30 participants from local geriatric fa- 
cilities. Remember that we are interested in the effects of our new therapy 
on depression. Accordingly, we form two groups: The first group receives 
the treatment, while the other receives a psychologically inert form of 
intervention that does not involve therapy. We have 30 participants who 
must now be randomly assigned to the two conditions. According to the 
tenets of random assignment, we must ensure each participant has an 
equal probability of winding up in either of the two groups. This is usually 
accomplished by using a computer-generated random selection process or 
by simply referring to a table of random numbers. (Contrast this with a 
nonrandom approach to assignment.) 

For example, taking the first 15 participants and assigning them to the 
treatment condition and the last 15 to the control condition would not be 
random assignment because the participants did not have an equal oppor- 
tunity to be placed in either of the two groups. If we proceeded this way, 
then we could be introducing a selection bias into the study. The first 15 
participants might be significantly different on a variety of factors than the 
second 15. Are the first 15 more motivated to participate because they are 
actively seeking symptom reduction? Motivation level itself might be a 
confounding variable. The second group of 1 5 might not be as motivated 
to participate for a variety of reasons. 

Therefore, the results we obtained might be affected by these differences 
and not be a reflection of our intervention (the independent variable), even 
if we found a positive effect. If we randomly assigned the participants to 
each of the two groups, we would expect that the two groups should be 
equivalent in terms of participant characteristics and any other confound- 
ing variables, such as motivation. This equivalence is a researcher's best de- 
fense against the impact of extraneous influences on the validity of a study. 
Accordingly, random assignment should be utilized whenever possible in 
the context of research design and methodology. Figure 3.2 gives a graphic 
representation of random assignment in our example. 

Obviously, random selection and random assignment — collectively re- 
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Population of all 

individuals aged 65 or 

older suffering from 

depression. 




Sample of the population 

for use in the research 

study; in this case, a sample 

of convenience from local 

geriatric facilities. 



30 participants 
selected 




15 



participants 
control 



15 

participants 
treatment 



Random selection: 

Each individual has 

equal chance of being 

chosen for the study. 




Random 
assignment to 
treatment or 
control group. 



Figure 3.2 A graphic example of random assignment. 

Using our new sample of convenience, we can build on the example provided in Figure 3. 1 to illustrate 
the process of random assignment. We manage to randomly select 30 participants from local geriatric 
facilities. We must now randomly assign them to either the therapy group or the control group. 
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DON'T FORGET 

Techniques for holding variables 
constant, such as matching and 
blocking, are not intended to be 
substitutes for true randomization. 



ferred to as "randomization" — 
are essential techniques for mini- 
mizing the impact of extraneous 
variables and ensuring the validity 
of the conclusions drawn from the 
results of a research study. Al- 
though optimal, randomization is 
not the only approach for minimizing, or controlling for, the impact of ex- 
traneous variables. In our previous discussion, we highlighted the theoret- 
ical and logistical difficulties inherent in trying to achieve true random se- 
lection and random assignment. These realities often make it difficult, if 
not impossible, to achieve true randomization. In some circumstances, ran- 
domization might not be the best approach to use because the researchers 
might be more interested in or concerned with the impact of specific ex- 
traneous variables and confounds. When this is the situation, some mea- 
sure of experimental control can be achieved by holding the influence of 
the variable or variables in question constant in the research design. 



HoldingVariables Constant 

The primary and most common method for holding the influence of a 
specific variable or variables constant in a study is referred to as matching. 

This assignment procedure in- 
volves matching research partici- 
pants on variables that may be re- 
lated to the dependent variable 
and then randomly assigning each 
member of the matched pair to 
either the experimental condition 
or control condition (Beins, 
2004; Graziano & Raulin, 2004). 
The application of matching is 
best illustrated through example. 
Let's revisit the example we con- 



DON'T FORGET 

Matching 

This assignment procedure in- 
volves matching research partici- 
pants on variables that may be re- 
lated to the dependent variable 
and then randomly assigning each 
member of the matched pair to ei- 
ther the experimental condition or 
the control condition. 
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sidered earlier regarding a new treatment for depression in an elderly 
population. 

In our previous discussion, we randomly assigned participants to either 
an experimental or a control condition. We will use the same basic premise 
in this example, in which we are still interested in knowing whether our 
treatment will produce greater reduction of symptoms of depression than 
will receiving an inert intervention that does not involve therapy. As we 
previously discussed, we sampled from the population in the same way, 
and still ended up using a sample of convenience; we then randomly as- 
signed the participants to the experimental or control group. 

Now let's add another layer of complexity to the scenario. We still want 
to know whether our new treatment is effective, but we might also be in- 
terested in the potential impact of other specific, potentially confounding 
variables. Consider, for example, that therapeutic outcome can sometimes 
be influenced by intelligence. Difficulties with memory and other modes 
of cognitive functioning might also significantly impact the outcome of 
therapy when working with elderly clients. 

Given this, the researchers decide to control for the effects of memory 
in the study. Accordingly, the methodology is altered to include a general 
measure of memory functioning that demonstrates adequate reliability 
and validity. In practice, this assessment would have to be given before 
matching or assignment could occur. 

The first step in the matching procedure would be to create matched 
pairs of participants based on their memory screening score. In this case, 
we have a two-group design — therapy versus an inert treatment (control 
group). The researchers would take the two highest scores on the mem- 
ory test and those participants would constitute a matched pair. Next, 
this matched pair would be split and each participant randomly assigned 
such that one member ends up in the experimental group and one mem- 
ber ends up in the control group. In other words, each participant in this 
first matched pair still has an equal likelihood of being assigned to either 
the treatment or the control condition. The process is repeated, so the 
next two highest scores on the memory screen would be matched and 
then randomly assigned to the two conditions. The process would con- 
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tinue until each of the participants was assigned to either one of the two 
conditions. 

Note that matching can be used with more than two groups. With 
three groups, the three highest scores would be randomly assigned, with 
four groups the four highest scores, and so on. Similarly, participants can 
be matched on more than one variable. In this case, for example, we 
might also be interested in gender as a potentially confounding variable. 
The researchers could take the two highest male memory scores and ran- 
domly assign each participant such that one is in the experimental and the 
other in the control group, and then repeat the procedure for females 
based on memory score. Ultimately, the goal is the same: to make the 
experimental and control conditions equivalent on the variables of 
interest. In our example, the researchers could safely assume that the two 
groups had equivalent representation in terms of gender and memory 
functioning. 

Although matching is one of the more common approaches for hold- 
ing the influence of extraneous variables constant, there are other ap- 
proaches that can be used. The first of these approaches is referred to as 
"blocking." Unlike matching, which is concerned with holding extrane- 
ous variables constant, blockingis an approach that allows the researchers 

to determine what specific im- 
pact the variable in question is 
having on the dependent variable 
(Christensen, 1988). In essence, 
blocking takes a potentially con- 
founding variable and examines 
This assignment technique allows it as ano ther independent vari- 



DON'T FORGET 

Blocking 



able. 



the researchers to determine 
what specific impact the variable 
in question is having on the de- An example should help clarify 
pendent variable by taking a po- how blocking is actually imple- 
tentially confounding variable and me nted in the context of a re- 
examining it as another indepen- , it; 

... r search study. Lets return once 

dent variable. J 

again to our treatment effective - 
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ness study for depression in the elderly. In the original design, we were in- 
terested in whether the new treatment was effective for reducing symp- 
toms of depression in the elderly There were two groups — one group re- 
ceived the new treatment and the other group received an inert or control 
intervention. 

In this example, the independent variable is the new treatment and the 
dependent variable is the symptom level of depression. Blocking allows 
for a potentially confounding variable to become an independent variable. 
We will use memory as our potentially confounding or blocking variable. 
In other words, we not only want to know whether the treatment is effec- 
tive, we also want to know whether memory functioning has an impact on 
therapeutic effectiveness. Therefore, the researchers might first divide the 
participants into two categories based on memory score. For instance, 
scores below a certain cutoff number would constitute the "impaired 
memory" group and scores above the cutoff number would constitute the 
"adequate memory" group. The participants would then be randomly as- 
signed to either the experimental group or the control group. Note that 
now there are two independent variables, therapy and memory, and four 
groups instead of two groups in our study. In the original design, there 
were only two groups, experimental and control. Now the researchers 
have four groups: therapy/impaired memory, therapy/ adequate memory, 
no therapy/impaired memory, and no therapy/adequate memory. As you 
can see, the researchers can now compare the performance of these 
groups to determine whether memory had an effect on therapeutic effec- 
tiveness. Without the use of blocking, these additional comparisons would 
not have been possible. 

Another selection approach for controlling extraneous variables re- 
quires the researchers to hold the extraneous variable in question constant 
by selecting a sample that is very uniform or homogeneous on the variable 
of interest. For example, the researchers might first select only those el- 
derly individuals with intact memory functioning for the therapy study, 
most likely based on a pretest cutoff score. All participants who did not 
meet the cutoff score would be excluded from the study. The participants 
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would then be randomly assigned to the different experimental condi- 
tions. The rationale behind this approach is relatively straightforward. 
Specifically, if all of the participants are roughly equivalent on the variable 
under consideration (e.g., memory), then the potential impact of the vari- 
able is consistent across all of the groups and cannot operate as a con- 
found. Although this is an effective way of eliminating potential con- 
founds, it has a negative effect on the generalizability of the results of a 
study. In this example, any results would pertain only to elderly individu- 
als with adequate memory functioning and not to a broader representation 
of elderly people suffering from depression. 

Statistical Approaches 

The final method for attaining control of extraneous variables that we will 
discuss involves statistical analyses rather than the selection and assign- 
ment of participants. Rapid Reference 3.7 lists the methods we'll describe 
in some detail here. 

One statistical approach for determining equivalence between groups 
is to use simple analyses of means and standard deviations for the variables 
of interest for each group in the study. A mean is simply an average score, 
and a standard deviation is a measure of variability indicating the average 

amount that scores vary from the 

mean. (These concepts will be dis- 

^^ft(7d/(/ /?£&/"£/?££ S 7 cussed in more detail in Chapter 

7.) We could use means and stan- 



Statistical Approaches for dard deviations to obtain a snap- 

Holding Extraneous snot of group scores on a variable 

Variables Constant of interest, such as memory. 

Let's assume we randomly as- 
sign our elderly participants to our 
two original groups and that we 
are still interested in memory 
functioning as a potential con- 
founding variable. Theoretically, 



• Descriptive statistics 

• T-test 

• AN OVA 

• ANCOVA 

• Partial correlation 
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random assignment should make the two groups equivalent in terms of 
memory functioning. If we were cynical (or perhaps obsessive- 
compulsive), we could check the means and standard deviations for mem- 
ory scores for both groups to see if they were consistent. For some re- 
searchers, eyeballing the results would be sufficient — in other words, if 
the means and standard deviations were close for both groups, we would 
assume that there was no confound. For others, a statistical test (/-test for 
two groups, or analysis of variance [ANOVA] for three or more groups) to 
compare the means would be run to determine whether there was a statis- 
tically significant difference between the groups on the variable of interest 
(Howell, 1992). If significant differences were found, then the groups 
would not be equivalent on the variable of interest, suggesting a possible 
confound. This approach can be particularly useful when random assign- 
ment is not possible or practical. 

There are two other statistical approaches that can be used to minimize 
the impact of or to control for the influence of extraneous variables. The 
first is referred to as "analysis of covariance," or ANCOVA, and it is used 
during the data analysis phase (Huitema, 1980). This statistical technique 
adjusts scores so that participant scores are equalized on the measured 
variable of interest. In other words, this statistical technique controls for 
individual differences and adjusts for those differences among nonequiv- 
alent groups (see Pedhazur & Schmelkin, 1991; Winer, 1971). 

A partial correlation is another statistical technique that can be used 
to control for extraneous variables. In essence, a partial correlation is a 
correlation between two variables after one or more variables have 
been mathematically controlled for and partialed out (Pedhazur & 
Schmelkin, 1991). For example, a partial correlation would allow us to 
look at the relationship between memory and symptom level while 
mathematically eliminating the impact of another possibly confounding 
variable such as intelligence or level of motivation. This assumes, of 
course, that appropriate data on each variable have been collected and 
can be included in the analyses. These statistical approaches can be used 
regardless of whether random selection and assignment were employed 
in the study. 
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SUMMARY 

This chapter discussed general strategies and controls that can be used to 
reduce the impact of artifact and bias in any given research design. These 
basic strategies are particularly useful because they help reduce the impact 
of unwanted bias even when the researcher is not aware that bias is pre- 
sent. The implementation of these basic strategies ultimately reduces 
threats to validity and bolsters the confidence that we can place in a study's 
findings. 



Jjfr TEST YOURSELF *6Su 

1. Theoretically, a sample is most representative of the total population 
when random is used. 

2. Deception can be used in any aspect of the study as long as the benefits of 
the study outweigh the potential risks. True or False? 

3. The most effective way to equalize the impact of potentially confounding 
variables and ensure the internal validity of the study is through 

4. Research participants can assume various roles that can influence the re- 
sults of a study. True or False? 

5. Research studies that are quasi-experimental are preferred over true ex- 
periments because they utilize random assignment. True or False? 

Answers: I . selection; 2. False (There are ethical prohibitions against using deception under 
certain circumstances.); 3. random assignment; 4.True; 5. False (True experiments utilize 
random assignment.) 
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DATA COLLECTION, ASSESSMENT 
METHODS, AND MEASUREMENT 
STRATEGIES 



The importance of measurement in research design cannot be over- 
stated. Even the most well-designed studies will prove useless if 
inappropriate measurement strategies are used in the data collec- 
tion stages. This chapter will discuss issues related to data collection and 
measurement strategies in research design. To be clear, this chapter is not 
meant to be an exhaustive treatment of the topic. Indeed, this area of re- 
search design could be, and has been, the topic of a number of in-depth 
texts devoted solely to the subject. Rather, this chapter is meant to high- 
light important concepts related to measurement and data collection. We 
start with general issues related to the importance of measurement in re- 
search design. Next, we consider specific scales of measurement and how 
they are related to various statistical approaches and techniques. Finally, 
we turn to psychometric considerations and specific measurement strate- 
gies for collecting data. 

MEASUREMENT 

Measurement is often viewed as being the basis of all scientific inquiry, and 
measurement techniques and strategies are therefore an essential compo- 
nent of research methodology. A critical juncture between scientific the- 
ory and application, measurement 'can be defined as a process through which 
researchers describe, explain, and predict the phenomena and constructs 
of our daily existence (Kaplan, 1964; Pedhazur & Schmelkin, 1991). For 
example, we measure how long we have lived in years, our financial suc- 
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cess in dollars, and the distance between two points in miles. Important 
life decisions are based on performance on standardized tests that mea- 
sure intelligence, aptitude, achievement, or individual adjustment. We 
predict that certain things will happen as we age, become more educated, 
or make other significant lifestyle changes. In short, measurement is as im- 
portant in our daily existence as it is in the context of research design. 

The concept of measurement is important in research studies in two 
key areas. First, measurement enables researchers to quantify abstract 
constructs and variables. As you may recall from Chapter 2, research is 
usually conducted to explore the relationship between independent and 
dependent variables. Variables in a research study typically must be oper- 
ationalized and quantified before they can be properly studied (Kerlinger, 
1992). As was discussed in Chapter 2, an operational definition takes a vari- 
able from the theoretical or abstract to the concrete by defining the vari- 
able in the specific terms of the actual procedures used by the researcher 
to measure or manipulate the variable. For example, in a study of weight 
loss, a researcher might operationalize the variable "weight loss" as a de- 
crease in weight below the individual's starting weight on a particular date. 

The process of quantifying the 

variable would be relatively simple 

DOE'T FORGET in this situation— for example, 



the amount of weight lost in 
Importance of pounds and ounces during the 
Measurement in course of the research study. 
Research Design Without measurement, re- 
Measurement is important in re- searchers would be able to do little 
search design in two critical areas. else but make unsystematic obser- 
First, measurement allows re- vations of the world around us. 

searchers to quantify abstract con- c , . , , r ^ - - , 

. , , _ , , second, the level of statistical 
structs and variables. Second, the 

level of statistical sophistication sophistication used to analyze 

used to analyze data derived from data derived from a study is di- 

a study is directly dependent on rect i y dependent on the scale of 

the scale of measurement used to 



quantify the variables of interest. 



measurement used to quantify the 
variables of interest (Anderson, 
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1961). There are two basic cate- 
gories of data: nonmetric and i/UJS -L HUitvril/J. 



metric. Nonmetric data (also re- 

c j . /•. .• j . N . Nonmetric Data vs. 

ierred to as qualitative data) are typ- 

.. ., . Metric Data 

ically attributes, characteristics, or 

categories that describe an indi- Nonmetric data (which cannot be 
vidual and cannot be quantified. quantified) are predominantly 

used to describe and categorize. 
Metric data (also referred to as ,. , . , . , . ■ 

K Metric data are used to examine 

quantitative data) exist in differing amounts and magnitudes. 
amounts or degrees, and they re- 
flect relative quantity or distance. Metric data allow researchers to exam- 
ine amounts and magnitudes, while nonmetric data are used predomi- 
nantly as a method of describing and categorizing (Hair, Anderson, 
Tatham,& Black, 1995). 



Scales of Measurement 

There are four main scales of measurement subsumed under the broader 
categories of nonmetric and metric measurement: nominal scales, ordinal 
scales, interval scales, and ratio scales. Nominal and ordinal scales are non- 
metric measurement scales. Nominal scales (see Rapid Reference 4.1) are the 



ftap/'o 'Reference 4./ 



Distinguishing Characteristics of Nominal Measurement 

Scales and Data 

• Used only to qualitatively classify or categorize not to quantify. 

• No absolute zero point. 

• Cannot be ordered in a quantitative sequence. 

• Impossible to use to conduct standard mathematical operations. 

• Examples include gender religious and political affiliation, and marital 
status. 

• Purely descriptive and cannot be manipulated mathematically. 
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least sophisticated type of measurement and are used only to qualitatively 
classify or categorize. They have no absolute zero point and cannot be 
ordered in a quantitative sequence, and there is no equal unit of measure- 
ment between categories. In other words, the numbers assigned to the 
variables have no mathematical meaning beyond describing the character- 
istic or attribute under consideration — they do not imply amounts of an 
attribute or characteristic. This makes it impossible to conduct standard 
mathematical operations such as addition, subtraction, division, and mul- 
tiplication. Common examples of nominal scale data include gender, reli- 
gious and political affiliation, place of birth, city of residence, ethnicity, 
marital status, eye and hair color, and employment status. Notice that each 
of these variables is purely descriptive and cannot be manipulated mathe- 
matically. 

The second type of nonmetric measurement scale is known as the or- 
dinal scale. Unlike the nominal scale, ordinal scale measurement (see Rapid 
Reference 4.2) is characterized by the ability to measure a variable in terms 
of both identity and magnitude. This makes it a higher level of measurement 



ftap/a 'Reference 4.2 



Distinguishing Characteristics of Ordinal Measurement 
Scales and Data 

• Build on nominal measurement. 

• Categorize a variable and its relative magnitude in relation to other 
variables. 

• Represent an ordering of variables with some number representing 
more than another 

• Information about relative position but not the interval between the 
ranks or categories. 

• Qualitative in nature. 

• Example would be finishing position of runners in a race. 

• Lack the mathematical properties necessary for sophisticated statistical 
analyses. 
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than the nominal scale because the ordinal scale allows for the categoriza- 
tion of a variable and its relative magnitude in relation to other variables. 
Variables can be ranked in relation to the amount of the attribute pos- 
sessed. In simpler terms, ordinal scales represent an ordering of variables, 
with some number representing more than another. 

One way to think about ordinal data is by using the concept of greater 
than or less than, which incidentally also highlights the main weakness of 
ordinal data. Notice that knowing whether something has more or less 
of an attribute does not quantify how much more or less of the attribute or 
characteristic there is. We therefore know nothing about the differences 
between categories or ranks; instead, we have information about relative 
position, but not the interval between the ranks or categories. Like nomi- 
nal data, ordinal data are qualitative in nature and do not possess the math- 
ematical properties necessary for sophisticated statistical analyses. A com- 
mon example of an ordinal scale is the finishing positions of runners in a 
race. We know that the first runner to cross the line did better than the 
fourth, but we do not know how much better. We would know how much 
better only if we knew the time it took each runner to complete the race. 
This requires a different level or scale of measurement, which leads us to 
a discussion of the two metric scales of measurement. 

Interval and ratio scales are the two types of metric measurement scales, 
and are quantitative in nature. Collectively, they represent the most so- 
phisticated level of measurement and lend themselves well to sophisti- 
cated and powerful statistical techniques. The interval scale (see Rapid Ref- 
erence 4.3) of measurement builds on ordinal measurement by providing 
information about both order and distance between values of variables. 
The numbers on an interval scale are scaled at equal distances, but there is 
no absolute zero point. Instead, the zero point is arbitrary. Because of this, 
addition and subtraction are possible with this level of measurement, but 
the lack of an absolute zero point makes division and multiplication im- 
possible. It is perhaps best to think of the interval scale as related to our 
traditional number system, but without a zero. On either the Fahrenheit or 
Celsius scale, zero does not represent a complete absence of temperature, 
yet the quantitative or measurement difference between 10 and 20 degrees 
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Distinguishing Characteristics of Interval Measurement 
Scales and Data 

• Quantitative in nature. 

• Build on ordinal measurement. 

• Provide information about both order and distance between values of 
variables. 

• Numbers scaled at equal distances. 

• No absolute zero point; zero point is arbitrary. 

• Addition and subtraction are possible. 

• Examples include temperature measured in Fahrenheit and Celsius. 

• Lack of an absolute zero point makes division and multiplication impos- 
sible. 



= flop/a 'Reference 4.4 

Distinguishing 

Characteristics of Ratio 

Measurement Scales 

and Data 

• Identical to the interval scale, 
except that they have an ab- 
solute zero point. 

• Unlike with interval scale data, 
all mathematical operations are 
possible. 

• Examples include height, weight, 
and time. 

• Highest level of measurement. 

• Allow forthe use of sophisti- 
cated statistical techniques. 



is the same as the difference be- 
tween 40 and 50 degrees. There 
might be a qualitative difference 
between the two temperature 
ranges, but the quantitative differ- 
ence is identical — 10 units or de- 
grees. 

The second type of metric 
measurement scale is the ratio scale 
of measurement (see Rapid Refer- 
ence 4.4). The properties of the 
ratio scale are identical to those of 
the interval scale, except that the 
ratio scale has an absolute zero 
point, which means that all math- 
ematical operations are possible. 
Numerous examples of ratio scale 
data exist in our daily lives. Money 
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is a pertinent example. It is possible to have no (or zero) money — a zero 
balance in a checking account, for example. This is an example of an ab- 
solute zero point. Unlike with interval scale data, multiplication and divi- 
sion are now possible. Ten dollars is 10 times more than 1 dollar, and 20 
dollars is twice as much as 10 dollars. If we have 100 dollars and give away 
half, we are left with 50 dollars, which is 50 times more than 1 dollar. Other 
examples include height, weight, and time. Ratio data is the highest level 
of measurement and allows for the use of sophisticated statistical tech- 
niques. 



PSYCHOMETRIC CONSIDERATIONS 

A Note on Measurement and Operational Definitions 

The assessment instruments and methods used in all forms of research 
should meet certain minimum psychometric requirements. As we will dis- 
cuss later in this chapter, there is a wide variety of measurement strategies 
and techniques that are common in research design. As with considera- 
tions in research design, the research question and the constructs under 
study usually drive the choice of measurement technique or strategy. More 
specifically, the researcher is usually concerned with operationalizing and 
quantifying the independent and dependent variables through some type 
of measurement strategy. For example, depression can be operationalized 
through measurement by using the score from a standardized instrument. 
Similarly, a score on a personality trait measure might be used to opera- 
tionalize a particular personality trait. Recall from Chapter 2 that an oper- 
ational definition is simply the definition of a variable in terms of the 
actual procedures used to measure or manipulate it (Graziano & Raulin, 
2004) . Given this definition, it is easy to see that operational definitions are 
essential in research because they help to quantify abstract concepts. Op- 
erationalization can be easily accomplished through measurement. 

For example, a researcher studying a new treatment for depression 
would be interested in operationalizing what depression is and how it is 
measured, or quantified. Although this might seem self-evident at first, 
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consider all of the potential ways that depression could be operationalized 
and measured. Is it a score on an instrument designed to measure depres- 
sion? Is it the presence or absence of certain symptoms as determined 
through a structured clinical interview? Could it be based on behavioral 
observations of activity level? This merely scratches the surface of the pos- 
sible operational definitions of a single variable. Let's stay with the same 
example and consider how we would measure improvement in level of de- 
pression. After all, if we are interested in a new treatment for depression, 
we will have to see whether our participants improve, remain the same, or 
deteriorate after receiving the intervention. So, how should we quantify 
improvement? Depending on the operational definition, improvement 
could be determined by observing reduced scores on a depression assess- 
ment, reduced symptoms on a diagnostic interview, observations of in- 
creased activity level, or perhaps observations of two or all of these in- 
dices. 

Ultimately, the choice lies with the researcher, the nature of the research 
question to be answered, the availability of resources, and the availability 
of measurement techniques and strategies for the construct of interest. In 
any event, the accuracy and quality of the data collected from the study are 
directly dependent on the measurement procedures and related opera- 
tional definitions used to define and measure the constructs of interest. 
Regardless of the approach used, measurement approaches and instru- 
ments should meet certain minimum psychometric requirements that help 
ensure the accuracy and relevance of the measurement strategies used in a 
study. Reliability and validity are the most common and important psy- 
chometric concepts related to assessment-instrument selection and other 
measurement strategies. 

Reliability and Validity and Their Relationship to Measurement 

At its most general level, reliability (see Rapid Reference 4.5) refers to the 
consistency or dependability of a measurement technique (Andrich, 1981; 
Leary, 2004). More specifically, reliability is concerned with the consis- 
tency or stability of the score obtained from a measure or assessment tech- 
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Measurement of Reliability 

Reliability refers to the consistency or dependability of a measurement 
technique, and it is concerned with the consistency or stability of the 
score obtained from a measure or assessment overtime and across set- 
tings or conditions. If the measurement is reliable, then there is less 
chance that the obtained score is due to random factors and measure- 
ment error 

So, how do we know if a measurement method or instrument is reliable? 
In its simplest form, reliability is concerned with the relationship between 
independently derived sets of scores, such as the scores on an assessment 
instrument on two separate occasions. Accordingly, reliability is usually ex- 
pressed as a correlation coefficient, which is a statistical analysis that tells 
us something about the relationship between two sets of scores or vari- 
ables. Adequate reliability exists when the correlation coefficient is .80 or 
higher 

nique over time and across settings or conditions (Anastasi & Urbina, 
1997; White & Saltz, 1957). If the measurement is reliable, then there is 
less chance that the obtained score is due to random factors and measure- 
ment error. Measurement error is uncontrolled for variance that distorts 
scores and observations so that they no longer accurately represent the 
construct in question. Scores obtained from most forms of data collection 
are subject to measurement error. Essentially, this means that any score 
obtained consists of two components. The first component is the true score, 
which is the score that would have been obtained if the measurement strat- 
egy were perfect and error free. The second component is measurement er- 
ror, which is the portion of the score that is due to distortion and impreci- 
sion from a wide variety of potential factors, such as a poorly designed test, 
situational factors, and mistakes in the recording of data (Leary, 2004). 
Although all measures contain error, the more reliable the method or 
instrument, the less likely it is that these influences will affect the accuracy 
of the measurement (see Rapid Reference 4.6). Let's consider an example. 
In psychology, personality is a construct that is thought to be relatively 
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Strategies for Increasing Reliability and Minimizing 
Measurement Error 

There are numerous practical approaches that can be used alone or in 
combination to minimize the impact of measurement errorThese sugges- 
tions should be considered during the design phase of the study and 
should focus on data collection and measurement strategies used to mea- 
sure the independent and dependent variables. First, the administration of 
the instrument or measurement strategy should be standardized — all 
measurement should occur in the most consistent manner possible. In 
other words, the administration of measurement strategies should be 
consistent across all of the participants taking part in the study. Second, 
the researchers should make certain that the participants understand the 
instructions and content of the instrument or measurement strategy. If 
participants have difficulty understanding the purpose or directions of the 
measure, they might not answer in an accurate fashion, which has the po- 
tential to bias the data.Third, every researcher involved in data collection 
should be thoroughly trained in the use of the measurement strategy. 
There should also be ample opportunity for practice before the study be- 
gins and repeated training over the course of the study to maintain con- 
sistency. Finally, every effort should be made to ensure that data are 
recorded, compiled, and analyzed accurately. Data entry should be closely 
monitored and audits should be conducted on a regular basis (Leary 
2004). 



stable. If we were to assess a person's personality traits using an objective, 
standardized instrument, we would not expect the results to change sig- 
nificantly if we administered the same instrument a week later. If the re- 
sults did vary considerably, we might wonder whether the instrument that 
we used was reliable (see Rapid Reference 4.7). Notice that we chose this 
example because personality is a relatively stable construct that we would 
not expect to change drastically over time. Keep in mind that some con- 
structs and phenomena, such as emotional states, can vary considerably 
with time. We would expect reliability to be high when measuring a stable 
construct, but not when measuring a transient one. 
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Assessing Reliability 

Reliability can be determined through a variety of methods: 

• Test-retest reliability refers to the stability of test scores overtime 
and involves repeating the same test on at least one other occasion. For 
example, administering the same measure of academic achievement on 
two separate occasions 6 months apart is an example of this type of 
reliability.The interval of time between administrations should be con- 
sidered with this form of reliability because test-retest correlations tend 
to decrease as the time interval increases. 

• Split-half reliability refers to the administration of a single test that 
is divided into two equal halves. For example, a 60-question aptitude 
test that purports to measure one aspect of academic achievement 
could be broken down into two separate but equal tests of 30 items 
each. Theoretically, the items on both forms measure the same con- 
struct.This approach is much less susceptible to time-interval effects 
because all of the items are administered at the same time and then 
split into separate item pools afterward. 

• Alternate-form reliability is expressed as the correlation between 
different forms of the same measure where the items on each measure 
represent the same item content and construct.This approach requires 
two different forms of the same instrument, which are then adminis- 
tered at different times. The two forms must cover identical content 
and have a similar difficulty level. The two test scores are then corre- 
lated. 

• Interrater reliability is used to determine the agreement between 
different judges or raters when they are observing or evaluating the 
performance of others. For example, assume you have two evaluators 
assessing the acting-out behavior of a child. You operationalize "acting- 
out behavior" as the number of times that the child refuses to do his or 
herschoolwork in class. The extent to which the evaluators agree on 
whether or when the behavior occurs reflects this type of reliability. 
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what the test or measurement is anot h er critical aspect of mea- 
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does so. Conceptually, validity 

seeks to answer the following as P art of an overall measurement 

question: "Does the instrument or strategy. Whereas reliability refers 

measurement approach measure to the consistency of the measure, 

what it is supposed to measure?" ,. ,., r , , 

validity focuses on what the test or 



measurement strategy measures 
and how well it does so (Anastasi & Urbina, 1997). Therefore, the con- 
ceptual question that validity seeks to answer is the following: "Does the 
instrument or measurement approach measure what it is supposed to 
measure?" If so, then the instrument or measurement approach is said to 
be valid because it accurately assesses and represents the construct of 
interest. 

Validity and reliability are interconnected concepts (Sullivan & Feld- 
man, 1979) . This can be demonstrated by the fact that a measurement can- 
not be valid unless it is reliable. Remember that validity is concerned not 
only with what is being measured, but also how well it is being measured. 
Think of it this way: If you have a test that is not reliable, how can it accu- 
rately measure the construct of interest? Reliability, or consistency, is 
therefore a hallmark of validity. Note, however, that a measurement strat- 
egy can be reliable without being valid. The measurement strategy might 
provide consistent scores over time, but that does not necessarily mean it 
is accurately measuring the construct of interest. 

Consider an example in which you choose to use in your study an in- 
strument that purports to measure depression. It produces reliable scores 
as evidenced by a high test-retest reliability coefficient. In other words, 
there is a high positive correlation between the pretest and posttest scores 
on the same measure. On further inspection, however, you notice that the 
content of the instrument is more closely related to anxiety. You are mea- 
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suring something reliably, but at this point it might not be depression. In 
other words, the instrument, though reliable, might not be a valid measure 
of depression; instead, it might be a valid measure of anxiety. 

As we discussed earlier in this chapter, the accurate measurement of the 
constructs and variables in a study is a critical component of research. The 
most well-designed study is meaningless and awaste of time and resources 
if the independent and dependent variables cannot be identified, concep- 
tualized, operationalized, and quantified. The validity of measurement ap- 
proaches is therefore a critical aspect of the overall design process. How, 
then, is the validity of a measurement strategy established? Like reliability, 
validity is determined by considering the relationship, either quantitatively 
or qualitatively, between the test or measurement strategy and some ex- 
ternal, independent event (Groth-Marnat, 2003). The most common 
methods for demonstrating validity are referred to as content-related, cri- 
terion-related, and construct-related validity (Campbell, 1960). 

Content-related validity refers to the relevance of the instrument or mea- 
surement strategy to the construct being measured (Fitzpatrick, 1983). 
Put simply, the measurement approach must be related to the construct 
being measured. Although this concept is usually applied to the develop- 
ment and critique of psychological and other forms of tests, it is also ap- 
plicable to most forms of measurement strategies used in research. 

The approach for determining content validity starts with the opera- 
tionalization of the construct of interest. The test developer defines the 
construct and then attempts to develop item content that will accurately 
capture it. For example, an instrument designed to measure anxiety should 
contain item content that reflects 
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design and methodology. A significant amount of research, especially in 
psychology, is conducted using preexisting, commercially available instru- 
ments (see Rapid Reference 4.8) . However, a researcher might be interested 
in studying a variable that cannot be measured with an existing instrument 
or test — or perhaps the use of commercially available instruments might 
be cost prohibitive. This is a relatively common situation that should not 
bring the study to a grinding halt. Most forms of research do not require 
the use of preexisting or expensive measurement strategies. It is not un- 

^ftap/'o 'Reference 4.S 

Commercially Available Instruments and 
Measurement Strategies 

A huge number of measurement instruments are commercially available 
to researchers.They are particularly abundant in the areas of psychologi- 
cal and educational research. Researchers must be careful to consider a 
number of factors when deciding on whether an existing test is appropri- 
ate for data collection in a research study. A consideration of the psycho- 
metric properties (validity and reliability) is always an essential first step. 
Interested readers are referred to the latest editions of the Mental Mea- 
surementsYearbook and Tests in Print, which provide psychometric data 
and reviews for a wide variety of measurement materials (Impara & Plake, 
1 998; Murphy, Impara, & Plake, 1 999). What follows is a nonexhaustive list 
of other factors that should be considered when evaluating a test: 

Reliability 

Validity 

Cost 

Time needed to administer 

Reading level 

Test length 

Theoretical soundness 

Norms 

Standardized administration procedure 

Well-documented manual 
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usual for researchers to develop their own measures or measurement 
strategies. This is a legitimate approach to data collection as long as the 
measure or strategy accurately captures the construct of interest. 

Consider the following example. A researcher is interested in studying 
aggression in young children. The researcher consults the literature only 
to find that there is no preexisting measure for quantifying aggression for 
the age group under consideration. Rather than abandoning the project, 
the researcher decides to create a measure to capture the behavior of 
interest. First, "aggression" must be operationalized. In this case, our re- 
searcher is interested in studying physical aggression, so the researcher de- 
cides to operationalize aggression as the number of times a child strikes 
another child during a certain period of time. A checklist of items related 
to this type of aggression is then developed. The researcher observes chil- 
dren in a variety of settings and records the frequency of aggressive be- 
havior and the circumstances surrounding each event. Although there are 
no psychometric data available for this approach, it is apparent that the 
measurement strategy has content validity. The items and the approach 
clearly measure the construct of aggression in young children as opera- 
tionalized by the researcher. 

Another effective approach to 
determining the validity of an in- DON'T FORGET 



strument or measurement strat- 
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time in the future. If the measure is compared to an outside criterion that 
is measured at the same time, it is then referred to as concurrent validity. If 
the measure is compared to an outside criterion that will be measured in 
the future, it is then referred to as predictive validity. 

Again, an example may help clarify this concept. Let's assume that a re- 
searcher is using an instrument or has developed another measurement 
strategy to capture the construct of depression. There are a number of 
ways that criterion validity could be determined in this case. The measure 
would have concurrent criterion validity if the measure indicated depres- 
sion and the participant met diagnostic criteria for depression at the same 
time. When both suggest the presence of depression, then we have the be- 
ginnings of criterion validity. The measure would have predictive criterion 
validity if the measure indicated depression and the participant met diag- 
nostic criteria for depression at some point in time in the future. 

The final concept that we will discuss with respect to demonstrating the 
validity of an instrument or measurement strategy is construct validity. 
Construct validity assesses the extent to which the test or measurement strat- 
egy measures a theoretical construct or trait (Groth-Marnat, 2003). Al- 
though there are numerous approaches for determining construct validity, 
we will focus on the two most common methods: convergent and diver- 
gent validity (Bechtold, 1959; Campbell & Fiske, 1959). Again, these con- 
cepts are best illustrated through an example. The first approach is to ex- 
plore the relationship between the measure of interest and another 
measure that purportedly captures the same construct (i.e., convergent valid- 
ity). Consider our depression example. If the instrument or strategy we 
were using in our depression study were accurately capturing the construct 
of depression, we would expect that there would be a strong relationship 
between the measurement in question and other measures of depression. 
This relationship would be expressed as the correlation between the two 
approaches, or a correlation coefficient. A strong positive correlation between 
the two measures would suggest construct validity. Construct validity can 
also be demonstrated by showing that two constructs are unrelated (i.e., di- 
vergent validity). For example, we would not expect our measure of depres- 
sion to have a strong positive correlation with a measure of happiness. In 
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So far, we have considered various 



verges with the measurement of 
similar or different constructs. 



basic issues related to measurement. We have highlighted the importance 
of scales of measurement and how they can guide data collection. Our dis- 
cussion of psychometrics pointed out the importance of considering reli- 
ability and validity when choosing a measurement instrument or approach 
to quantify the independent and dependent variables under consideration. 
These are important considerations, but this chapter would not be com- 
plete without a discussion of some of the different methods and ap- 
proaches used for collecting the data for the constructs of interest. Re- 
member that the constructs of interest in any research study tend to be 
defined in terms of independent and dependent variables. 

So, how do we measure our independent and dependent variables? 
They are, after all, the focus of any study. The number of available mea- 
surement strategies is staggering, and is sometimes limited only by the 
researcher's imagination and choice of research question. The choice of 
strategy also tends to vary by research question and research design, which 
is why it is difficult to account for every type of measurement approach. 
Despite this, the choice of measurement strategy is usually driven by a va- 
riety of factors that progress from general to specific. 

The broadest consideration is always the nature of the research ques- 
tion and the independent and dependent variables. In other words, the 
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researcher decides how best to measure the independent and dependent 
variables with the ultimate goal being to answer the research question. Ad- 
dressing this broad and all-important choice requires the consideration of 
more specific factors. 

For example, our earlier discussion highlighted the importance of scales 
of measurement. At what level should we try to measure our variables, 
knowing that this decision can affect our ability to employ certain statisti- 
cal techniques during the data analysis stage? At this point, the thought 
might come to mind that all the researcher has to do is find a way to mea- 
sure the variables of interest at the interval or ratio level of measurement. 
Although this might allow for the use of preferred statistical techniques, it 
is not always possible or even desirable to measure variables at the interval 
and ratio levels because not all variables lend themselves to these levels of 
measurement. Take a moment to think about all of the interesting and crit- 
ically important variables that are measured by the nominal or ordinal 
scales of measurement. Gender, race, ethnicity, religious affiliation, em- 
ployment status, and political party affiliation are all examples of nominal 
or ordinal data that are common in many forms of social science research. 

Another factor might be related to the psychometric properties of the 
measurement strategy. Although reliability and validity are usually consid- 
ered primarily in the context of psychological tests and other instruments, 
the concepts are important to consider in all types of measurement. The 
fact that you are not using a psychological test or other psychometrically 
validated instrument does not mean that reliability and validity are no 
longer important considerations. Regardless of what you are measuring 
and how you do so, that measurement approach should measure what it 
purports to measure and do so in a consistent fashion. 

For psychological and other tests, a related issue is whether the instru- 
ment is appropriate for the population the researcher is studying. For ex- 
ample, consider a case in which a researcher wants to use an established, 
commercially available instrument to assess levels of depression in the el- 
derly. The researcher would have to make certain that the test developers 
considered and captured this population when developing the instrument. 
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If they did not, then it would be inappropriate to use the instrument to 
study depression in this population. 

Availability is another important consideration when selecting a mea- 
surement strategy. What approaches, if any, already exist for measuring the 
construct of interest? One might want to consider established forms of 
measurement, such as psychometrically based tests. Instruments of this 
type can be researched by consulting the most recent version of the Men- 
tal Measurements Yearbook. For example, there is a wide variety of psycho- 
metrically sound instruments available for the measurement of depression 
and personality. Another approach might be to review related research to 
see how others have measured the construct or similar constructs. The lit- 
erature might suggest what instrument has been used most often to mea- 
sure the construct of interest with the same population that you are inter- 
ested in. Or, if there is no instrument available, it might suggest an 
appropriate strategy for capturing the construct. For example, previously 
conducted research might provide a framework for designing a unique as- 
sessment strategy for quantifying specific behavioral problems with young 
children. Note that original research questions might require the develop- 
ment of unique and specialized assessment instruments and strategies. 

Cost is another consideration. Funding tends to vary from study to 
study. Some studies are well funded, while others are conducted with little 
or no funding. Those of you who conducted dissertation research with ac- 
tual participants probably have some experience with the little-or-no- 
funding category. One of the primary drawbacks of using commercially 
available instruments is that they can be costly, hence the expression 
"commercially" available. There is considerable variation in the cost asso- 
ciated with various instruments. Some are very reasonable and others are 
cost prohibitive. The cost consideration is partially dependent on how 
many participants are in the study. The more participants to be measured 
on some construct, the higher the cost. In studies for which money is a se- 
rious consideration, the use of some commercially available instruments 
might be prohibitive. This might require the researcher to develop or cre- 
ate a measure or assessment strategy to capture the constructs of interest. 

term LinG - live, informative, Non-cost and Genuine i 



I 1 4 ESSENTIALS OF RESEARCH DESIGN AND METHODOLOGY 



Although this is relatively common, there are some potential problems 
that arise from creating a new measure or measurement strategy. The first 
concern is that new instruments and strategies might have questionable 
reliability and validity. It cannot be assumed that the instrument or strat- 
egy is reliable or valid. At a minimum, the researcher will have to take steps 
to demonstrate the reliability and validity of the measurement approach. 
After all, you have to measure variables in a reliable and valid fashion be- 
fore you can make any statements about the relationship between them, 
regardless of what statistical analysis might suggest. 

Another issue regarding unique measurement approaches and instru- 
ments relates to the existing body of scientific literature in a given topic 
area. There are certain instruments and approaches that tend to appear in 
the scientific literature for the study of given topics. For example, there are 
a number of common measures of personality and depression that appear 
consistently in the research literature. Studies using these instruments can 
add to an existing body of literature. Conversely, studies using obscure or 
unique instruments and approaches, although valuable in and of them- 
selves, might not be as relevant to that body of literature because the mea- 
surement strategies are not consistent and therefore not directly compa- 
rable. 

Training is another factor to consider when selecting a measurement in- 
strument or strategy. Training is important for two reasons: The first re- 
lates to the training of the researcher and is usually related primarily to the 
use of commercially available psychological and related tests. Many test 
providers have minimum user requirements. In our case, that would mean 
that the researcher must meet certain educational and/or training require- 
ments before the company will permit the use of the instrument in the 
study. Although the requirements vary by test, the typical user must have 
an advanced degree in the social sciences or education, and/or have spe- 
cific training in psychometrics. In some instances, test developers will al- 
low the use of these instruments by less-qualified individuals if they attend 
a training seminar that provides a certification in the proper use of the in- 
strument. 

The second reason relates to training in a broad sense. The use of mea- 

term LinG - live, informative, Non-cost and Genuine i 



DATA COLLECTION, ASSESSMENT, AND MEASUREMENT I I 5 



surement instruments and strategies, whether commercially available or 
not, requires a theoretical foundation related to the construct of interest. 
For example, a researcher measuring some aspect of personality should be 
familiar with personality theory and the theoretical approach adopted by 
the instrument or strategy in question. Similarly, a researcher interested in 
evaluating the effectiveness of a behavioral modification system for chil- 
dren should be familiar with the theoretical underpinnings and practical 
application of concepts related to behavior modification before designing 
the measurement strategy. Remember that all validation begins after a 
concept has been given an accurate operational definition that reflects the 
construct of interest. Appropriate training assists in this process and is the 
first step in addressing the validity of the measurement strategy or instru- 
ment. 

The time needed to conduct the measurement and the ease of its use are 
the last two factors that we will consider. Researchers should let the con- 
cept of parsimony guide them here. Generally, parsimony refers to selecting 
the simplest explanation for a phenomenon when there are competing 
explanations available (Kazdin, 2003c). The key concept here is simplicity. 
Researchers should attempt to measure the variables of interest as effi- 
ciently and accurately as possible. Remember the importance of reliability 
and validity. Depending on the construct, a longer and more complicated 
assessment will not necessarily provide a more accurate measurement 
than a strategy that is less complicated and takes half the time. In addition, 
the likelihood of mistakes, fatigue, or inattention among both researchers 
and participants might become more prevalent as the measurement strat- 
egy becomes more time intensive and complicated. This, in turn, could af- 
fect the accuracy of the data. In short, avoid unnecessarily long and com- 
plicated assessment procedures whenever possible. 

METHODS OF DATA COLLECTION 

With these factors in mind, we will now discuss some of the more com- 
mon approaches to data collection and measurement in research. Again, 
there are many different approaches to data collection, and this discussion 
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is not intended to be exhaustive of the subject matter. Despite this, there 
are certain broad categories that encompass the more common types of 
data collection techniques. Generally, and not surprisingly, the research 
question and the nature of the variables under investigation usually drive 
the choice of measurement strategy for data collection. 

We have mentioned the use of psychological testing and other similar 
commercially available instruments throughout this chapter. The use of 
this type of testing in research is very common, especially in psychology, 
education, and other social sciences. A brief survey of available instru- 
ments suggests that we can capture a wide variety of factors related to the 
human experience. For example, instruments exist that allow researchers 
to measure personality, temperament, adjustment, symptom level, behav- 
ior, career interest, memory, academic achievement and aptitude, emo- 
tional competence, and intelligence. These instruments are attractive to 
researchers because they tend to have established reliability and validity, 
and they eliminate the need to develop and validate an instrument from 
scratch. Many of these instruments also produce data at the interval and 
ratio levels, which is a prerequisite feature for certain types of statistical 
analyses. The development of new instruments is best left to specialists 
with extensive training in psychological testing, psychometrics, and test 
development. In other words, always consider existing instruments as data 
collection methods before developing one of your own. A poorly designed 
measurement strategy can confound the results of even the best research 
design. Again, let reliability and validity be your guides. 

Although testing is common, it is not the only method for data collec- 
tion available to researchers. There are often times when it is necessary to 
adopt another approach to data collection. As we discussed earlier, there 
are many reasons that this might be the case. For example, not all variables 
of interest have been operationalized in the form of standardized tests, or 
some research questions might require unique or different approaches. 
Cost and time constraints might also be important considerations. In cases 
like these, the researcher might have to consider and adopt other data col- 
lection strategies. In many cases, these strategies are just as valid as, and are 
even preferable to, the use of formal testing. 
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Some of these alternative ap- 
proaches, as summarized in Rapid ^= nup/u /\6/6/'6f?C6 T. 7 
Reference 4.9, include interview- 
ing, global ratings, observation, Main Approaches to 
and biological measures. As we Measurement and Data 
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will see, sometimes the most erfi- 
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cient data collection techniques 

are also the simplest. • Formal testing (psychological, 

A thorough interview is a form educational, academic, intelli- 

of self-report that is a relatively 

• Interviewing 
simple approach to data collec- 
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tion. Although simple, it can pro- 
. . . r . r . • Observation 

duce a wealth or information. An 

• Biological measures 
interview can cover any number 

of content areas and is a relatively 

inexpensive and efficient way to collect a wide variety of data that does not 
require formal testing. One of the most common uses of the interview is 
to collect life-history and biographical data about the research participants 
(Anastasi & Urbina, 1997; Stokes, Mumford, & Owens, 1994). The effec- 
tiveness of an interview depends on how it is structured. In other words, 
the interview should be thought out beforehand and standardized so that 
all participants are asked the same questions in the same order. Similarly, 
the researchers conducting the interview should be trained in its proper 
administration to avoid variation in the collection of data. Interviews are 
a relatively common way of collecting data in research and the data they 
collect and the forms they take are limited only by the requirements of the 
research question and the related research design. One drawback of using 
an interview procedure is that the data obtained may not be appropriate 
for extensive statistical analysis because they simply describe a construct 
rather than quantifying it. 

Examples of interviews are not difficult to identify. Employment inter- 
views are a classic example. Although they are not typically used in re- 
search studies, their main goal is to gather data that will allow a company 
to answer the research question (so to speak) of whether someone would 
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make a good employee. Interviews are also an essential component of 
most types of qualitative research, which is briefly discussed in Chapter 5. 
For example, if we were interested in the impact of childhood trauma on a 
participant's current functioning, we might construct an interview to cap- 
ture his or her experiences from childhood through adulthood. 

Like interviews, global ratings are another form of self-report that is 
commonly used as a data collection technique in research. Unlike an in- 
terview, this approach to measurement attempts to quantify a construct or 
variable of interest by asking the participant to rate his or her response to 
a summary statement on a numerical continuum. This is less complex than 
it sounds, and everyone has been exposed to this data collection approach 
at one point in time or another. If a researcher were interested in measur- 
ing attitudes toward a class in research methods, he or she could develop 
a set of summary statements and then ask the participants to rate their at- 
titudes along a bipolar continuum. One statement might look like this: 

On a scale of I to 5, please rate the extent to which you enjoy the 
research-methods class. 

12 3 4 5 

Hate it Neutral Love it 



In this example, the participant would simply circle the appropriate num- 
ber that best reflects his or her attitude toward the research-methods class. 
The use of global ratings is also common when asking participants to rate 
emotional states, symptoms, and levels of distress. 

The strength of global ratings is that they can be adapted for a wide va- 
riety of topics and questions. They also yield interval or ratio data. Despite 
this, researchers should be aware that such a rating is only a global measure 
of a construct and might not capture its complexity or more subtle nu- 
ances. For example, the previous example may tell us how much someone 
enjoys a certain research-method class, but it will not tell us why the per- 
son either loves it or hates it. 
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Observation is another versatile approach to data collection. This ap- 
proach relies on the direct observation of the construct of interest, which 
is often some type of behavior. In essence, if you can observe it, you can 
find some way of measuring it. The use of this approach is widespread in 
a variety of research, educational, and treatment settings. 

Let's consider the use of observation in a research setting. This ap- 
proach is an efficient way to collect data when the researcher is interested 
in studying and quantifying some type of behavior. For example, a re- 
searcher might be interested in studying cooperative behavior of young 
children in a classroom setting. After operationalizing "cooperative be- 
havior" as sharing toys, the researcher develops a system for quantifying 
the behavior. In this case, it might be as simple as sitting unobtrusively 
in a corner of the classroom, observing the behavior of the children, and 
counting the number of times that they engage in cooperative behavior. 
Alternatively, if we were interested in studying levels of boredom in a 
research-methods class, we could simply count the number of yawns or 
number of times that someone nods off. 

As with other forms of data collection, the process of quantifying ob- 
servations should be standardized. The behavior in question must be ac- 
curately operationalized and everyone involved in the data collection 
should be trained to ensure accuracy of observation. Proper operational- 
ization of the variable and adequate training should help ensure adequate 
validity and interrater reliability. Videotaping and multiple raters are fre- 
quently used to confirm the accuracy of the observations. The use of ob- 
servational methods usually produces frequency counts of a particular 
behavior or behaviors. These data are frequently at the interval and ratio 
level. 

Obtaining biological measures is another strategy for collecting research 
data. This approach is common in medical and psychobiological research. 
It often involves measuring the physiological responses of participants 
to any number of potential stimuli. The most common examples of re- 
sponses include heart rate, respiration, blood pressure, and galvanic skin 
response. As with all of the forms of measurement that we have discussed, 
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DON'T FORGET 

Multiple Measurement 
Strategies 

Multiple measurement strategies 
can be used in a research study, 
even if they are all used to mea- 
sure the same construct or vari- 
able. For example, a psychological 
test, an interview, and a global rat- 
ing could all be used to measure 
the construct of depression. This 
may be considered an optimal ap- 
proach, as convergence on multi- 
ple measures would increase over- 
all confidence in a study's findings. 



operationalization and standard- 
ization are essential. Consider a 
study investigating levels of anxi- 
ety in response to a certain aver- 
sive stimulus. We could use any of 
the other measurement ap- 
proaches to gather the data we 
need regarding anxiety, but we 
chose instead to collect biological 
data because it is very difficult for 
people to regulate or fake their re- 
sponses. We operationalize anxi- 
ety as scores on certain physiolog- 
ical responses, such as heart rate 
and respiration. Each participant 



is exposed to the stimulus in the 
exact same fashion and then is measured across the biological indicators 
we chose to operationalize anxiety. The data obtained from biological 
measures are frequently at the interval or ratio level. 

SUMMARY 

This chapter focused on important issues and considerations related to 
various aspects of data collection and measurement. Measurement strate- 
gies are an integral aspect of research design and methodology that should 
be considered at the earliest stages of design conceptualization. Special 
consideration should be given to scales of measurement, psychometric 
properties, and specific measurement strategies for collecting data. Ulti- 
mately, measurement is critical in research because it allows researchers to 
quantify abstract constructs and variables. This is an essential step in ex- 
ploring the relationship between various independent and dependent vari- 
ables. 
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Putting It Into Practice 



An Example 

Suppose a researcher is asked to design a study to examine student atti- 
tudes toward two different research-methods classes taught by two differ- 
ent instructors. The researcher is told that the purpose of the study is to 
determine whetherthere are significant differences in satisfaction be- 
tween the two classes. The referral source cannot provide a significant 
level of funding. The researcher starts by clarifying the research question 
and the variables to be quantified and studied. The referral source wants 
to quantify whetherthere are significant differences between the two 
classes' satisfaction levels with regard to a variety of class components, 
such as class size, quality of the instructor; usefulness of the textbook, 
pace of the class, and so on. These components are the variables of inter- 
est. The referral source wants to compare the two classes, which suggests 
that certain parametric statistical tests (e.g., a t-test) will be used to deter- 
mine whetherthere are differences between the two classes on the vari- 
ables of interest. Accordingly, the researcher decides that the variables of 
interest should be measured at the interval or ratio level. 

The key question is what measurement strategy to use.The researcher 
needs a measurement strategy that allows for measurement at the interval 
or ratio level. Not surprisingly, a review of the Mental MeasurementsYear- 
book and the literature reveals that there are no existing measures of stu- 
dent satisfaction toward certain components of a research-methods class. 
Furthermore, an interview will not provide interval or ratio data, and it 
might be inappropriate to take biological measurements in this setting be- 
cause it would certainly be cost prohibitive and would disrupt the flow of 
the classes. Behavioral observation might allow us to infer satisfaction, but it 
is not a direct measure of the variables we have been asked to assess. Re- 
member that what is being measured is satisfaction with a number of dif- 
ferent course components, and not just general satisfaction with the class. 

The researcher decides to use global ratings. Questions are designed to 
capture the variables of interest and the students will be asked to respond 
on a scale from I to 5, with 5 suggesting extreme satisfaction and I sug- 
gesting extreme dissatisfaction. This approach is cost effective and will pro- 
vide data at the interval level (because there is no absolute zero on the 
scale), which will allow forthe use of the preferred parametric statistical 
technique. Wanting to be thorough, the researcher includes an open- 

(continued) 
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ended question (an interview question) with each global rating so that the 
students can elaborate on their numerical rating with narrative material. 
Although this type of information does not lend itself to statistical analysis, 
it should provide more specifics as to why the students are satisfied or 
dissatisfied with various class components. The data are collected and ana- 
lyzed, and the results, perhaps not surprisingly, suggest that everyone is 
dissatisfied with everything about research methods! 



..£S*i TEST YOURSELF *£&.. 



1 . is often defined as a process through which researchers de- 
scribe, explain, and predict the phenomena and constructs of our daily 
existence. 

2. data constitute the highest level of measurement and allow for 

the use of sophisticated statistical techniques. 

3. , or qualitative, data are the attributes, characteristics, or cate- 
gories that describe an individual and are used predominantly as a method 

of describing and categorizing. , or quantitative, data refer to 

differing amounts or degrees of an attribute, and these data reflect rela- 
tive quantity or distance. 

4. A measurement can be valid, but not reliable. True or False? 

5. and are two important psychometric considera- 
tions when selecting psychological and other tests. 

Answers: I . Measurement; 2. Ratio; 3. Nonmetric, Metric; 4. False (A measure must be reli- 
able to be valid.); 5. Reliability, validity 
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GENERAL TYPES OF RESEARCH 
DESIGNS AND APPROACHES 



Once the researcher has determined the specific question to be 
answered and has operationalized the variables and research 
question into a clear, measurable hypothesis, it is time to con- 
sider a suitable research design. Although there are endless ways of classi- 
fying research designs, they usually fall into one of three general cate- 
gories: experimental, quasi-experimental, and nonexperimental. This 
classification system is based primarily on the strength of the design's ex- 
perimental control. To determine the classification of a particular research 
design, it is helpful to ask several key questions. First, does the design in- 
volve random assignment to different conditions? If random assignment 
is used, it is considered to be a randomized, or true, experimental design. 
If random assignment is not used, then a second question must be asked: 
Does the design use either multiple groups or multiple waves of measure- 
ment? If the answer is yes, the design is considered quasi-experimental. If 
the answer is no, the design would be considered nonexperimental (see 
Trochim,2001). 

Although each of the three types of research designs can provide use- 
ful information, they differ gready in the degree to which they enable re- 
searchers to draw confident causal inferences from a study's findings (as 
discussed in Chapter 1). In this chapter, we will review each of the three 
classes of research design, the ways that each type of research design are 
applied, and the overall strengths and weaknesses of each type of research 
design. 



123 
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EXPERIMENTAL DESIGNS 

A true experimental design is one in which study participants are ran- 
domly assigned to experimental and control groups. We have discussed 
randomization in previous chapters, so this chapter will simply highlight 
the importance of randomization in terms of the strength of a research de- 
sign. Although randomization is typically described using examples such 
as rolling dice, flipping a coin, or picking a number out of a hat, most stud- 
ies now rely on the use of random numbers tables to help them assign their 
research participants (as discussed in Chapters 2 and 3). 

A random numbers table is nothing more than a random list of numbers 
displayed or printed in a series of columns and rows. Typically, computer 
programs that generate such lists allow you to request a specific quantity 
and range of numbers to be generated. To use a random numbers table to 
assign study participants to groups, you must first determine the exact 
numbers that you will use to determine the assignments. For example, if 
you have three groups or conditions, you may use the numbers 1,2, and 3. 
Alternatively, if you were assigning participants to two groups, you could 
use the numbers 1 and 2, or simply odd or even numbers, to determine the 
group assignments. The important point is that you define the assignment 
criteria ahead of time, so that your selections are not biased and remain 
purely random. 



DOIVT FORGET 

Random Numbers Table 



After selecting your assignment 
criteria, you must randomly iden- 
tify a starting place in the random 
numbers table. This is usually 
done by either selecting a starting 
place on the table before begin- 
ning (e.g., top right of third col- 



A random numbers table is nothing 
more than a random list of num- 
bers displayed or printed in a se- 
ries of columns and rows. Using a umn ) or ^P^ dosing your eyes 
random numbers table is one ef- and randomly pointing to a loca- 
fective way to randomly assign tion on tne table, which will serve 

participants to groups within a re- , ,~, 

r , ° r as the starting point. Once you 

search study. 

have selected a starting point, you 
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will simply move through the list (either down the columns or across the 
rows) and identify each instance that numbers in your selected range ap- 
pear until you have group assignments for your entire sample of partici- 
pants. 

To illustrate, assume that you are planning to assign 100 participants to 
one of four different groups. You begin by defining the numbers 1, 2, 3, 
and 4 as the criteria for your group assignments. You then randomly point 
to a spot on the table from which to begin, and go down the columns of 
numbers one by one listing each appearance of 1 , 2, 3, or 4, while skipping 
all other numbers. Once you have listed 100 numbers, you will be done. 
The first number that you listed will determine the first participant's as- 
signment, the second number will determine the second participant's as- 
signment, and so forth. For example, using the table below, assume that we 
begin with number 0480 in the top row, left-most column of the table. If 
we worked our way down the columns, from left to right, listing appear- 
ances of 1, 2, 3, or 4 (in bold type) in the lastdigitoi each number, we would 
wind up with the following series of assignments: 2, 4, 1, 1, 3, 3, 2, 3, 1, 3, 
4,1,3,1,4,2. 



1 ? A ■> -'■y A ■> ' ) ' 



0480 


5011 


1536 


2011 


1647 


9174 


2362 


6573 


5595 


5393 


0995 


9198 


4134 


8360 


2527 


7265 


6393 


4809 


2167 


3093 


6243 


1684 


7856 


6376 


7570 


9975 


1837 


6656 


6121 


1782 


7921 


6902 


1008 


2751 


7756 


3498 



Although the standard randomization procedure will ensure random- 
ized groups, it will not necessarily result in groups of equal size. To obtain 
randomized groups of equal sizes, you could use a block randomisation pro- 
cedure. This procedure is carried out in the same manner as discussed, ex- 
cept that participants are grouped into blocks. Each block will consist of 
one assignment to each of the study groups. Therefore, the number of par- 
ticipants per block is the same as the number of groups in the study. Us- 
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ing the prior example, you would proceed down the columns listing each 
appearance of 1, 2, 3, or 4 only once until the first block is full, before mov- 
ing to the second block of four assignments, and so forth, until you have 
assigned 100 participants into a total of 25 blocks of four. Regardless of 
the technique used to randomly assign participants to groups within a 
study, random assignment increases the likelihood that changes in the de- 
pendent variable are attributable to the independent variables rather than 
to extraneous factors or nuisance variables. 

For example, a researcher examining the effectiveness of a certain treat- 
ment will want to be confident that the experimental group (the group 
receiving the new treatment) does not differ from the control group (the 
group receiving an alternative or placebo intervention) at the start of the 
study. Otherwise, the researcher will be unable to confidently attribute any 
between-group differences that appear at the end of the study to the treat- 
ment rather than to some preexisting differences. Although the researcher 
could attempt to make the groups more comparable by matching the two 
groups on any number of variables, it would ultimately be impossible to 
make the groups identical. There are simply too many (perhaps an infinite 
number of) other individual differences that remain uncontrolled for and 
that may influence the study's outcome. 

For example, the researcher may carefully match the two groups on 
characteristics such as age, gender, race, and socioeconomic status with 
the belief that these variables may have an impact on treatment outcomes. 
Although this procedure may make the groups more similar, the groups 
may still differ on other potentially important yet unmeasured variables, 
such as level of intelligence, degree of motivation, or prior treatment ex- 
periences. The fact that the groups may differ on some unknown and un- 
measured variable substantially reduces the researcher's ability to attribute 
changes in the dependent variable to the independent variable and to draw 
valid causal conclusions from the data. Randomization, however, tends to 
distribute individual differences equally across groups so that the groups 
differ systematically in only one way: the intervention being examined in 
the study. 

It is primarily for this reason that in most instances, when feasible, the 
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randomized experimental design is the preferred method of research. Put 
simply, it provides the highest degree of control over a research study, and 
it allows the researcher to draw causal inferences with the highest degree 
of confidence. In general, randomized or true experiments can be con- 
ducted using one of three main designs: (1) a randomized two-group 
posttest only or pretest-posttest design, (2) a Solomon four-group design, 
or (3) a factorial design. The following notation will be used to describe the 
different designs: 

X = experimental manipulation (independent variable); sub- 
scripts identify different levels or groups of the independent 
variable (e.g., X ( , X 2 , X 3 is used to denote either a no- 
intervention or alternative-intervention control group) 

Y = experimental manipulation (independent variable) other 
thanX 

O = observation 

R = indication that participants have been randomly assigned 
NR = indication that participants have not been randomly assigned 



Randomized Two-Group Design 

In their simplest form, true experiments are composed of two groups or 
two levels of an independent variable. Of course, as discussed in Chapter 
2, these designs could incorporate any number of levels of an independent 
variable and could thus consist of three, four, or any other number of 
groups. The primary purpose of this design is to demonstrate causality — 
that is, to determine whether a specific intervention (the independent vari- 
able) causes an effect (as opposed to being merely correlated with an ef- 
fect). 

For example, a researcher studying smoking cessation may randomly 
assign identified cigarette smokers either to a novel medication (experi- 
mental) group or to a comparison (control) group. There are several dif- 
ferent types of control or comparison groups that can be used in this type 
of design. The type of comparison group that is used largely depends on 
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the specifics of the research hypothesis and the factors that the researcher 
wishes to control. For example, if the researcher wishes to examine 
whether the intervention is more effective than no treatment at all, the re- 
searcher may choose to use some form of placebo control group. The 
placebo control condition may involve a seemingly useful intervention, 
but one that has no demonstrable effects (e.g., a sugar pill). This would 
control for effects that may occur in the experimental groups as a result 
of experimenter attention or other forms of bias. Alternatively, if the re- 
searcher wants to know whether the intervention is superior to a standard 
treatment, the researcher would choose the standard intervention as the 
comparison group. There are two basic types of randomized two-group 
designs: the posttest only and the pretest-posttest design. 

Randomised Two-Group Posttest Only Design 

In its most basic form, the two-group experimental design may involve 
litde more than random assignment and a posttest, as depicted here: 

R— X— O 

R— X — O 

Because individual characteristics are assumed to be equally distributed 
through randomization, there is theoretically no real need for a pretest to 
assess the comparability of the groups prior to the intervention. In this de- 
sign, random assignment ensures, to some degree, that the two groups are 
equivalent before treatment so that any posttreatment differences can be 
attributed to the treatment. This simple design encompasses all the neces- 
sary elements of a true randomized experiment: (1) random assignment, 
to distribute extraneous differences across groups; (2) intervention and 
control groups, to determine whether the treatment had an effect; and (3) 
observations following the treatment. 

Randomised Two-Group Pretest-Posttest Design 

Despite the relative simplicity of the posttest only approach, most ran- 
domized experiments typically utilize the pretest-posttest design, which is 
depicted here: 
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R— O— X,— O 



R— O— X 2 — O 

The addition of a pretest has several important benefits. First, it allows the 
researcher to compare the groups on several measures following random- 
ization to determine whether the groups are truly equivalent. Although it 
is likely that randomization distributed most differences equally across the 
groups, it is possible that some differences still exist. This process of mea- 
suring the integrity of random assignment is typically referred to as a ran- 
domisation check (see Rapid Reference 5.1). Researchers can often statisti- 
cally control for such preintervention differences if they are found. 

The second major benefit of a pretest is that it provides baseline infor- 
mation that allows researchers to compare the participants who com- 
pleted the posttest to those who did not. Accordingly, researchers can de- 
termine whether any between-group differences found at the end of the 
study are due to the intervention or merely to differential attrition of 



ftap/'o 'Reference 5./ 



Randomization Checks 

The randomization check, as its name suggests, is the process of examining 
the overall effectiveness of random assignment.The goal of this process is 
to determine whether random assignment resulted in nonequivalent 
groups. In performing randomization checks, researchers compare study 
groups or conditions on a number of pretest variables.These typically in- 
clude demographic variables such as age, gender; level of education, and 
any other variables that are measured or available prior to the interven- 
tion. Importantly, randomization checks should look for between-group 
differences on the baseline measures of the dependent variables because 
they are likely to have the most impact on outcomes. Generally, random- 
ization checks involve the use of statistical analyses that can examine dif- 
ferences between groups (as will be discussed in Chapter 7). If differences 
are found on certain variables, the researcher should determine whether 
they are correlated with the outcomes. Any such variables that are corre- 
lated with outcomes should be controlled for in the final analyses. 
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participants across groups. Attrition is the loss of participants during the 
course of the study. This process is typically referred to as an attrition anal- 
ysis (see Rapid Reference 5.2). 

For example, consider a study in which we compare outpatient treat- 
ment to inpatient treatment for depression. After examining the posttest 
data, we conclude that outpatient treatment produced greater reductions 
in depression than the inpatient treatment. Although random assignment 
may have ensured that all participant differences were distributed equally 
at baseline, it did not ensure that all groups would be the same at follow- 
up. Therefore, it is possible that certain participants were more likely to 
drop out of one group than the other, resulting in differential attrition. In 
this example, clients with higher levels of depression may have been more 
likely to drop out of the outpatient treatment, which would explain the rel- 
ative success of outpatient over inpatient treatment. 

Inevitably, a certain proportion of study participants will not make it to 
follow-up. Often referred to as mortality, attrition can have many negative 
effects on the validity of a research study. First, it may substantially dimin- 
ish the size of an experimental sample, which could reduce the study's 
statistical power and its ability to identify group differences if they exist. 
Second, because participants who drop out are likely to be different from 
those who complete, attrition may substantially limit the overall generaliz- 
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Attrition and Attrition Analysis 

Attrition analysis is a method of examining the overall impact of research 
attrition on the makeup of a study sample and the validity of a study's 
findings. The goal of this procedure is to identify any differences between 
those participants who complete the study and those who do not com- 
plete the study. To conduct this type of analysis, researchers compare 
completers versus noncompleters on a number of pretest variables. These 
may include demographic and any other variables that are measured or 
available on participants priorto the intervention. Generally, this process 
involves the use of several statistical analyses. 
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ability of a study's findings. Third, and perhaps most important, attrition 
from research is generally not randomly distributed (Cook & Campbell, 
1979) and appears to be systematically influenced by the participant char- 
acteristics, the nature of research interventions, the type of follow-up 
methods employed, and many other variables. This can contribute to 
highly systematic differences in attrition rates between research condi- 
tions. Unfortunately, such differential attrition cannot be confidently con- 
trolled for by random selection, random assignment, or any other experi- 
mental research method (Cook & Campbell, 1963). As a practical matter, 
when attrition occurs, it can neper be definitively established whether be- 
tween-group differences in a particular study were caused by the experi- 
mental intervention (s) or by differential attrition across conditions 
(Campbell & Stanley, 1963; Cook & Campbell, 1963). 

One obvious disadvantage of the pretest-posttest design is that the use 
of a pretest may ultimately make participants aware of the purpose of the 
study and influence their posttest results. If the pretest influences the 
posttests of both the experimental and control groups, it becomes a threat 
to the external validity or generalizability of a study's findings. This is be- 
cause the posttest will no longer reflect how participants would respond if 
they had not received a pretest. Alternatively, if the pretest influences the 
posttests of only one of the groups, it poses a threat to the internal valid- 
ity of a study. We discuss internal validity in detail in Chapter 6. 

Despite this drawback, the two-group experimental design may be seen 
as the gold standard in determining whether a new procedure (or inde- 
pendent variable) causes an effect. Researchers often employ this design 
in the early stages of an intervention's empirical validation. At these initial 
stages, the researcher's primary aim may simply be to examine the effec- 
tiveness of the intervention. This can be done easily and relatively inex- 
pensively by comparing the treatment to just one other group (typically a 
standard intervention or a placebo control). If the study's findings suggest 
that the treatment is effective, the researcher may want to test more- 
specific hypotheses regarding the treatment, such as isolating its effective 
components by dismantling the intervention (see Rapid Reference 5.3), 
examining its effectiveness with other populations, comparing it with 
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Dismantling Studies 

The term dismantling, as used in the research context, refers to studies 
aimed at isolating the effective components of an intervention. In studying 
specific interventions, researchers often begin by examining the effective- 
ness of the overall model. However; once the model is found to be effec- 
tive, the research community will often want to know why it is effective. 
To answer this question, researchers may begin dismantling the interven- 
tion. Dismantling can be done in a variety of ways, but typically involves a 
series of studies that compare an intervention with and without certain 
components. 

other types of treatment, or examining it in combination with other inter- 
ventions. Testing these hypotheses may require the use of other, perhaps 
more sophisticated experimental designs. 



Solomon Four-Group Design 

It is perhaps easiest to understand the Solomon four-group design if we 
think of it as a combination of the randomized posttest only and pretest- 
posttest two-group designs, as depicted below. 

R— O— X— O 

R— O— X— O 

R X— O 

R X 2 — O 

The principal advantage of this design is that it controls for the potential ef- 
fects of the pretest on posttest outcomes. This design allows the researcher 
to determine whether posttest differences resulted from the intervention, 
the pretest, or a combination of the treatment and the pretest. This last pos- 
sibility is an example of an interaction, which will be discussed shortly. Im- 
portantly, this design offers the best features of both of the two-group de- 
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signs, in that it allows the researcher to examine between-group differences 
at baseline, without the results' being influenced or confounded by the 
pretest administration. For this reason, the Solomon four-group design can 
also be viewed as a very basic example of a factorial design (discussed in the 
next section), as it examines the separate and combined effects of more 
than one independent variable (i.e., the pretest and the intervention). 

Factorial Design 

Most outcomes in research are likely to have several causes that interact 
with each other in a variety of ways that cannot be identified through the 
use of two-group experimental designs. For example, as discussed, the 
two-group pretest-posttest design might result in an undetectable interac- 
tion effect (see Rapid Reference 5.4 and Figure 5.1) between the pretest 
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Interaction Effects 

An interaction effect is the result of two or more independent variables 
combining to produce a result different from those produced by either 
independent variable alone. An interaction effect occurs when one inde- 
pendent variable differs across the levels of at least one other indepen- 
dent variable. Interactions can be found only in those factorial designs that 
include two or more independent variables. When reviewing the results 
of a factorial study, we begin by determining whether there are any signifi- 
cant interactions. If significant interactions are found, we can no longer in- 
terpret the simple effects (i.e., between-group differences for either inde- 
pendent variable alone), because they (as a result of the interaction) are 
determined to vary across levels of the other independent variable(s).This 
is illustrated in Figure 5. 1 , where the dose of a specific intervention is 
found to interact with client gender on the client success rate. 

In this example, we cannot interpret the simple effects of gender or dose 
(on client success rate) because they vary as a function of each other We 
can interpret only the interaction, which appears to indicate that males 
are more successful with lower doses, while females are more successful 
with higher doses. 
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Figure 5. 1 An example of an interaction effect. 

and the independent variable, such that posttest differences, if found, 
could not be confidently attributed to the independent variable. The 
Solomon four-group design, which may also be viewed as a factorial de- 
sign, was able to control for this potential interaction. The primary ad- 
vantage of factorial designs is that they enable us to empirically examine 
the effects of more than one independent variable, both individually and 
in combination, on the dependent variable, as depicted in the following 
illustration. The design, as its name implies, allows us to examine all pos- 
sible combinations of factors in the study: 



R— X— Y — O 



R— X — Y— O 

R— X 2 — Y— O 



R-X-Y-O 



To further illustrate the utility of this design, let us consider a situation 
in which a researcher is interested in examining how both treatment dose 
(4 vs. 8 sessions) and treatment setting (client's home vs. clinical setting) 
influence the effectiveness of a particular intervention. Although the re- 
searcher could conduct separate two-group randomized studies, this 
would not provide information on the potential interaction of different 
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doses of treatment with different treatment settings. The researcher 
might, for example, want to test the hypothesis that higher doses of treat- 
ment provided in a clinical setting will result in the best treatment out- 
comes. To best examine this hypothesis, the researcher could make use of 
a factorial design. This specific example would be considered a two-by- 
two (2 X 2) factorial design, because each of the two independent variables 
has two levels, as illustrated here: 











Dose 




Low (4 weeks) 


High (8 weeks) 


Home 






Clinical 







Following this same notation, a study with two independent variables in 
which one independent variable had three levels and the other had two lev- 
els would be considered a two-by-three (2 X 3) factorial design. Similarly, a 
study with three two-level independent variables would be considered a 
two -by- two -by- two (2x2x2) factorial design. Although a study could have 
any number of independent variables with any number of levels, it is 
important to note that each additional independent variable that is added 
to the factorial design increases the number of groups exponentially. 
Where a 2 X 2 design has four groups, a 2 X 2 X 3 design will have 12 groups. 

The factorial design has several important strengths. First, it permits 
the simultaneous examination of more than one independent variable. 
This can be critical because most, if not all, human behavior is determined 
by more than one variable. A second and related strength is the efficiency 
of the factorial design. Because it allows us to test several hypotheses in a 
single research study, it can be more economical to use a factorial design 
than to conduct several individual studies, in terms of both number of par- 
ticipants and researcher effort. Last, and perhaps most important, the fac- 
torial design allows us to look for interactions between independent vari- 
ables. Just as most human behavior is influenced by more than one 
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variable, it is equally probable that no combination of variables influences 
all persons in the same manner or influences human behavior the same 
way in all possible conditions. In other words, there are no universal truths. 
It is therefore critical to examine between-variable interactions to more 
accurately describe causal relationships (Fisher, 1953; Ray & Ravizza, 
1988). 

Are Experimental Designs Perfect? 

Despite their seemingly ideal nature, even studies that employ experimen- 
tal designs may face threats to validity in certain situations (Cook & Camp- 
bell, 1979). Threats to validity will be discussed in detail in Chapter 6, so 
we will not spend too much time discussing them in this chapter. We will, 
however, introduce you to some of the more common threats to validity. 
The first such threat occurs when a study's control group is inadvertently 
exposed to the intervention or when key aspects of the intervention also 
exist in the control group. This can substantially diminish the unique as- 
pects of an experimental intervention and reduce any potential between- 
group differences. 

Another situation that may threaten a study's validity (even with ran- 
domized experimental designs) occurs when one of the groups is per- 
ceived by participants as better or more desirable than the other. If partic- 
ipants in one condition feel that those in the other condition are somehow 
receiving superior treatment, they may experience feelings of resentment 
toward the researcher, may feel demoralized, or may even try harder or 
change their behavior to compensate. When condition assignment affects 
participant behavior in this manner, a contrast effect has occurred. Contrast 
effects can have a substantial impact on a study's findings. 

Still another potential threat to the validity of an experimental design 
occurs when there are substantial differences in the implementation of 
the experimental and control conditions. For example, this may occur if 
the clinician delivering the experimental treatment were far more experi- 
enced or educated than the one delivering the control treatment. This 
could obviously confound the study's findings by diminishing the re- 
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searcher's ability to attribute any measured change to the experimental in- 
tervention. 

Finally, and very importandy, experimental designs are also not immune 
to the effects of differential participant mortality (or dropout). This is par- 
ticularly likely when one of the conditions is noxious or onerous. Regard- 
less of randomization, participant dropout can substantially reduce a 
study's internal validity by systematically creating two or more very differ- 
ent groups and ultimately undoing what randomization initially achieved. 

Another important point about randomized experimental designs is 
that randomization, while far superior to other methods in ensuring that 
extraneous variables are distributed equally across groups, does not always 
work. This is of particular concern when sample sizes are small (i.e., fewer 
than 40 participants per group). Although researchers may attempt to ex- 
amine the integrity of randomization by comparing the study groups on a 
number of pretest measures, they can never be certain that differences do 
not exist. Ironically, because they lack sufficient statistical power (i.e., the 
ability to detect between-group differences if differences actually exist), 
studies with small sample sizes are less likely to find between-group dif- 
ferences on such measures (Kazdin, 2003c). 

The most obvious limitation of studies that employ a randomized ex- 
perimental design is their logistical difficulty. Randomly assigning partici- 
pants in certain settings (e.g., criminal justice, education) may often be 
unrealistic, either for logistical reasons or simply because it may be con- 
sidered inappropriate in a particular setting. Although efforts have been 
made to extend randomized designs to more real-world settings, it is often 
not feasible. In such cases, the researcher often turns to quasi-experi- 
mental designs. 

QUASI-EXPERIMENTAL DESIGNS 

As just noted, although random assignment is the best way to ensure the 
internal validity of a research study, it is often not feasible in real-world 
environments. Therefore, when randomized designs are not feasible, re- 
searchers must often make use of quasi-experimental designs. A good rule 
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of thumb is that researchers should attempt to use the most rigorous re- 
search design possible, striving to use a randomized experimental design 
whenever possible (Campbell, 1969). 

Cook and Campbell (1979) present a variety of quasi-experimental 
designs, which can be divided into two main categories: nonequivalent 
comparison-group designs and interrupted time-series designs. In this 
section, we will discuss these two major groups of quasi-experimental de- 
signs, followed by a brief overview of single-subjects designs. 

Nonequivalent Comparison-Group Designs 

Nonequivalent comparison-group designs are among the most com- 
monly used quasi-experimental designs. Structurally, these designs are 
quite similar to the experimental designs, but an important distinction is 
that they do not employ random assignment. In using these designs, the 
researcher attempts to select groups that are as similar as possible. Unfor- 
tunately, as indicated by the design's name, it is likely that the resulting 
groups will be nonequivalent. With careful analysis and cautious interpre- 
tation, however, nonequivalent comparison-group designs may still lead 
to some valid conclusions (Graziano & Raulin, 2004). 

Nonequivalent Groups Posttest-Only (Two or More Groups) 

In the nonequivalent groups posttest-only design, one group (the experi- 
mental group) receives the intervention while the other group (the control 
group) does not, as depicted here (NR = not randomized): 

NR— X— O 

NR— X 2 — O 

Unfortunately, there is a low probability that any resulting between-group 
differences on the dependent variable could be attributed to the interven- 
tion, so the results of a study using this design may be considered largely 
uninterpretable. 

One potential application of this design (Cook & Campbell, 1979; 
McGuigan, 1983) is a case in which each of the groups might represent a 
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different type of teaching method. If differences are found in the resulting 
test scores of students, it may suggest that the specific teaching method 
caused the differences. However, it is equally possible that students who 
were likely to achieve higher grades were selected for a specific teaching 
method. Ultimately, even this variation cannot rule out the serious threats 
to internal validity that plague this design. 

Nonequivalent Groups Pretest-Posttest (Two or More Groups) 

In the nonequivalent groups pretest-posttest design, the dependent vari- 
able is measured both before and after the treatment or intervention, as 
depicted here: 

NR— O— X— O 

NR— O— X 2 — O 

This gives it two advantages over its posttest only counterpart. First, with 
the use of both a pretest and a posttest, the temporal precedence of the in- 
dependent variable to the dependent variable can be established. This may 
give the researcher more confidence when inferring that the independent 
variable was responsible for changes in the dependent variable. Second, 
the use of a pretest allows the researcher to measure between-group dif- 
ferences before exposure to the intervention. This could substantially re- 
duce the threat of selection bias by revealing whether the groups differed 
on the dependent variable prior to the intervention. 

Interrupted Time-Series Designs 

The time-series design is perhaps best described as an extension of a one- 
group pretest-posttest design — the design is extended by the use of nu- 
merous pretests and posttests. In this type of quasi-experimental design, 
periodic measurements are made on a group prior to the presentation (in- 
terruption) of the intervention to establish a stable baseline. Observing 
and establishing the normal fluctuation of the dependent variable over 
time allows the researcher to more accurately interpret the impact of the 
independent variable. Following the intervention, several more periodic 
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measurements are made. There are four basic variations of this design: 
the simple interrupted time-series design, the reversal time-series design, 
the multiple time-series design, and the longitudinal design. 

Simple Interrupted Time-Series Design 

The simple interrupted time-series design is a within-subjects design in which pe- 
riodic measurements are made on a single group in an effort to establish a 
baseline, as depicted here: 

O— O— O— O— X— o— o— o— o 

At some point in time, the independent variable is introduced, and it is 
followed by additional periodic measurements to determine whether a 
change in the dependent variable occurs. 

According to Cook and Campbell (1979), there are two principal ways 
in which the independent variable can influence the series of observations 
after it has been introduced: (1) a change in the level and (2) a change in 
the slope. A sharp discontinuity in the values of the dependent variable at 
the point of interruption (introduction of the independent variable) would 
indicate a change in level. 

To better understand this, consider a study in which an employer was 
using a particular rating system to evaluate the employees' monthly pro- 
ductivity, before and after offering them stock options. One potential out- 
come might be a dramatic change in employee productivity. As depicted 
in Figure 5.2, employee productivity ratings that hovered between 2 and 3 




\ <b <b \ 

Consecutive Observations 
Figure 5.2 An example of a change in level. 
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Consecutive Observations 
Figure 5.3 An example of a change in slope. 



prior to the availability of stock options might abruptly rise to the 5—6 
range following the company offer. Alternatively, as depicted in Figure 5.3, 
the employer might find a steady increase in productivity following the 
company bonus. 

In addition to the level and slope, the researcher can examine the dura- 
tion of effects and whether they ultimately persist or decay over time. Fi- 
nally, the researcher can examine the ultimate latency of effects and 
whether the effect was immediate or delayed. The more immediate the 
change in the dependent variable, the more likely that the change is due to 
the influence of the independent variable. The ability to examine changes 
and trends across a series of observations made before and after the inter- 
vention permits the researcher to more closely identify the possibility of 
maturation, testing, and history as alternative explanations. (Maturation, 
testing, and history are discussed further in Chapter 6.) 

Although changes in either level or slope are often used as the basis 
for inferring a causal relationship between the independent and depen- 
dent variables, such inferences must be made with extreme caution be- 
cause this design does little to control for alternative explanations for 
measured change. For instance, in the prior example, it may have been 
the employer's attention rather than the bonus that led to increased 
employee productivity. Consequently, this design does not permit a 
researcher to draw causal inferences with any substantial degree of 
certainty. 
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Reversal Time-Series Design 

Also known as an ABA design (detailed on page 145), the reversal time-series 
design is basically a multi-subject variation of the single-subject reversal de- 
sign, which will be discussed later in this chapter. The basic goal of this 
design is to establish causality by presenting and withdrawing an interven- 
tion, or independent variable, one to several times while concurrently 
measuring change in the dependent variable (as depicted in the following). 
As in the simple time-series design, this design begins with a series of 
pretests to observe normal fluctuations in baseline. The name "reversal" 
refers to the idea that causality can be inferred if changes that occur fol- 
lowing the presentation of an intervention diminish or "reverse" when the 
independent variable is withdrawn. 

O— O— O— X— O— O— O— REV— O— O— O— X— O— O— O 

(A) (B) (A) 

To fully appreciate the elegance of this design, consider the prior ex- 
ample in which an employer offers a company bonus. Imagine if, rather 
than offering a one-time bonus, the employer offered a monthly bonus to 
employees for 2 months, removed it for 2 months, and then again offered 
it for 2 months. If increases in productivity were found following each 
bonus, and decreases in productivity were found each time the bonus was 
removed, one could be fairly confident that company bonuses influenced 
employee productivity. 

Despite the elegance of the reversal design, it is similar to its single- 
subject counterpart (to be discussed) in that it is not appropriate for the 
study of all independent or dependent variables. The fact is that the effects 
of some interventions simply cannot be reversed, as with learning to read 
or learning to ride a bike. You can offer and remove instruction on these 
skills as often as you like and you are still likely to observe a learning curve, 
with little reversal. It is therefore necessary for the researcher to carefully 
consider the characteristics of the independent variable to be studied 
when considering the use of this design. 
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Multiple Time-Series Design 

This design is essentially the same as the nonequivalent pretest-posttest 
design, with the exception that the dependent variable is measured at mul- 
tiple time points both before and after presentation of the independent 
variable, or longitudinally (see Rapid Reference 5.5), as depicted here: 

o— o— o— o— x— o— o— o— o 

o— o— o— o— x 2 — o— o— o— o 

Although this design is not randomized, it can be quite strong in terms of 
its ability to rule out other explanations for the observed effect. This de- 
sign enables us to examine trends in the data, at multiple time points, be- 
fore, during, and after an intervention (allowing us to evaluate the plausi- 
bility of certain threats to internal validity). Over and above the 
single-group time-series design, however, this design allows us to make 
both within-group and between-group comparisons, which may further 
reduce concerns of alternative explanations associated with history. 
Therefore, the major strength of this design is that it permits both within- 
and between-group comparisons. Regrettably, this design does not in- 
volve random assignment and thus is unable to eliminate all threats to in- 
ternal validity. 



ftap/'d Reference SJ 



Longitudinal Designs 

Longitudinal designs involve taking multiple measurements of each study 
participant overtime. Generally, the purpose of longitudinal studies is to 
follow a case or group of cases over a period of time to gather normative 
data on growth, to plot trends, or to observe the effects of special factors. 
For example, a researcher may want to study the development of more 
than one birth cohort (i.e., a group of individuals born in the same calen- 
dar year or group of years) to determine whether personality features are 
stable overtime. 
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Single-Subject Experimental Designs 

Not to be confused with nonexperimental single-subject case studies, 
which are covered later in this chapter, the single-subject experimental de- 
sign has a long and respected tradition in empirical research. According to 
Kazdin (2003c), single-subject experiments might be seen as true experi- 
ments because they "can demonstrate causal relationships and can rule 
out or make implausible threats to validity with the same elegance of 
group research" (p. 273). Similar to other experimental designs, the single- 
subject design seeks to (1) establish that changes in the dependent variable 
occur following introduction of the independent variable {temporal prece- 
dence) and (2) identify differences between study conditions. 

The one way that single-subject designs differ from other experimental 
designs is in how they establish control, and thereby demonstrate that 
changes in a dependent variable are not due to extraneous variables. For 
example, experimental designs rely on randomization to equally distribute 
extraneous variables and on statistical techniques to control for such 
factors if they are found. Alternatively, single-subject designs eliminate 
between-subject variables by using only one participant, and they control 
for relevant environmental factors by establishing a stable baseline of the 
dependent variable. If change occurs following the introduction of the in- 
tervention, or independent variable, the researcher can reasonably assume 
that the change was due to the intervention and not to extraneous factors. 

As with time-series designs, single-subject designs typically begin by es- 
tablishing a stable baseline. Establishing a stable baseline involves taking re- 
peated measures of a participant's behavior (dependent variable) prior to 
the administration of any intervention to make certain that the partici- 
pant's behavior is occurring at a consistent rate. To obtain a stable base- 
line, the researcher must make special efforts to control all relevant envi- 
ronmental variables that otherwise might affect the participant's 
responses. If the researcher does not know, or is uncertain, about which 
variables are relevant, the researcher must attempt to keep the partici- 
pant's environment as constant as possible by maintaining highly con- 
trolled conditions. 
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Single-Subject Reversal Design 

The reversal design (also known, like the reversal time-series design, as the 
ABA) is one of the most widely used single-subject designs. As in the re- 
versal time-series design, the single-subject reversal design measures behavior 
during three phases: before the intervention is introduced (A), after intro- 
ducing the intervention (B), and again after withdrawing the intervention 
(A). The primary goal of this design is, first, to determine whether there 
is a change in the dependent variable following the introduction of the 
independent variable; and second, to determine whether the dependent 
variable reverses or returns to baseline once the independent variable is 
withdrawn. To rule out the possibility that apparent effects might be due 
to a certain cyclical pattern involving either maturation or practice (to be 
discussed in Chapter 6), the ABA design may be extended to an ABAB 
design. To rule out even more complicated maturation or practice effects, 
the researcher could extend the design even further to an ABABA. Obvi- 
ously, the more measurements that are made, the less likely it is that 
measured change is due to anything other than the intervention, or inde- 
pendent variable. 

The single-subject reversal design has the same limitations as its time- 
series counterpart. First, and most obviously, not all behaviors are re- 
versible. Certain behaviors, such as reading, riding a bike, or learning a lan- 
guage, are somewhat permanent. Second, withdrawal of certain useful 
interventions or curative treatments may be unethical. To address this is- 
sue, many studies opt for the ABAB variant, in which the intervention is 
repeated and is designated as the final condition. 

Single-Subject Multiple-Baseline Design 

A second, very common single-subject approach is the multiple-baseline de- 
sign. This design demonstrates the effectiveness of a treatment by showing 
that behaviors across more than one baseline change as a consequence of 
the introduction of a treatment. In this design, several behaviors of a single 
subject are monitored simultaneously. Once stable baselines are estab- 
lished for all of the behaviors, one of the behaviors is exposed to the in- 
tervention. The primary goal of this design is to determine whether the 
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behavior that is exposed to the intervention changes while the other be- 
haviors remain constant. Once the first behavioral shift is identified, the 
intervention is applied to the next behavior, and so on. The logic behind 
this design is that it would be highly unlikely for baseline behaviors to suc- 
cessively shift by chance. 

For example, suppose a tutor wants to test whether providing small 
prizes or rewards can change two distinct behaviors that one of her stu- 
dents is displaying (i.e., asking questions, and attending tutoring sessions 
on time). The tutor, after establishing a stable baseline for both behaviors, 
observes that the student asks an average of 3 questions per week, and at- 
tends tutoring sessions on time an average of 2 times per week. The tutor 
might begin by giving the student prizes for asking questions regardless of 
her tardiness for the first two weeks. At this point, the tutor may find that 
the student begins to ask an average of 5 questions per week, while her tar- 
diness remains the same. After two weeks, the tutor might also begin giv- 
ing the student prizes for attending her tutoring sessions on time. In other 
words, the tutor might begin rewarding both behaviors. After another two 
weeks, the tutor might observe that the student's average rate of question- 
asking remains at 5 times per week, but that her average on-time atten- 
dance increases to 4 times per week. 

The primary limitation of the multiple-baseline design is that it requires 
the use of relatively independent behaviors. The behaviors that are being 
monitored must not be so interrelated that a change in one behavior re- 
sults in similar changes in others even though the other behaviors were not 
exposed to the intervention. For example, Kazdin (1973) points out that 
the design would not be useful for the study of children's classroom be- 
haviors because many of the classroom behaviors are interrelated. 

Overall, single-subject designs may be an important and logical alterna- 
tive to randomized experimental designs. Importantly, because of their fo- 
cus on single-subject behavior, these designs may be particularly suited for 
clinicians who want to determine whether certain treatments are working 
for specific clients or patients. 

In this section, we have provided a brief overview of several of the most 
widely used quasi-experimental designs. However, many other quasi- 
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experimental designs are available. In fact, there appears to be a nearly 
endless number of ways to arrange the independent and dependent vari- 
ables in an attempt to answer experimental questions with some degree of 
confidence. Unfortunately, despite their often elegant structure, quasi- 
experimental designs cannot automatically rule out threats to internal va- 
lidity with the same degree of certainty that true experimental designs can. 
At this point, however, the overall utility of quasi-experimental designs 
should be evident. Although they do not enable us to draw causal infer- 
ences with the same degree of confidence as do randomized designs, they 
do allow us to begin to examine real-world phenomena and begin to es- 
tablish causal inferences when true experimental designs are simply not 
feasible. 



NONEXPERIMENTAL OR QUALITATIVE DESIGNS 

In the past two sections, we discussed experimental and quasi-experi- 
mental designs. Each of these design classes can provide information 
from which to draw causal inferences, although to very different degrees 
of certainty. This is not the case for nonexperimental designs (i.e., de- 
scriptive and correlational designs). No matter how convincing the data 
from descriptive and correlational studies may appear, these nonexperi- 
mental designs cannot rule out extraneous variables as the cause of what 
is being observed because they do not have control over the variables and 
the environments that they study. Although there are many types of non- 
experimental methods, an extensive review of these techniques and de- 
signs is beyond the scope of this chapter. Therefore, we will provide a brief 
overview of four of the most widely used approaches: case studies, natu- 
ralistic observation, surveys, and focus groups. 

Case Studies 

Case studies involve an in-depth examination of a single person or a few 
people. The goal of the case study is to provide an accurate and complete 
description of the case. The principal benefit of case studies is that they 
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can expand our knowledge about the variations in human behavior. Al- 
though experimental researchers are typically interested in overall trends 
in behavior, drawing sample-to-population inferences, and generalizing to 
other samples, the focus of the case-study approach is on individuality and 
describing the individual as comprehensively as possible. The case study 
requires a considerable amount of information, and therefore conclusions 
are based on a much more detailed and comprehensive set of information 
than is typically collected by experimental and quasi-experimental studies. 

Case studies of individual participants often include in-depth inter- 
views with participants and collaterals (e.g., friends, family members, 
colleagues), review of medical records, observation, and excerpts from 
participants' personal writings and diaries. Case studies have a practical 
function in that they can be immediately applicable to the participant's di- 
agnosis or treatment. 

According to Yin (1994), the case-study design must have the following 
five components: its research question(s), its propositions, its unit(s) of 
analysis, a determination of how the data are linked to the propositions, 
and criteria to interpret the findings. According to Kazdin (1982), the ma- 
jor characteristics of case studies are the following: 

• They involve the intensive study of an individual, family, group, 
institution, or other level that can be conceived of as a single unit. 

• The information is highly detailed, comprehensive, and typically 
reported in narrative form as opposed to the quantified scores on 
a dependent measure. 

• They attempt to convey the nuances of the case, including specific 
contexts, extraneous influences, and special idiosyncratic details. 

• The information they examine may be retrospective or archival. 

Although case studies lack experimental control, their naturalistic and 
uncontrolled methods have set them aside as a unique and valuable source 
of information that complements and informs theory, research, and prac- 
tice (Kazdin, 2003c). According to Kazdin, case studies may be seen as 
having made at least four substantial contributions to science: They have 
served as a source of research ideas and hypotheses; they have helped to 
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develop therapeutic techniques; they have enabled scientists to study ex- 
tremely rare and low-base-rate phenomena, including rare disorders and 
one-time events; and they can describe and detail instances that contradict 
universally accepted beliefs and assumptions, thereby serving to plant 
seeds of doubt and spur new experimental research to validate or invali- 
date the accepted beliefs. 

Case studies also have some substantial drawbacks. First, like all nonex- 
perimental approaches, they merely describe what occurred, but they can- 
not tell us why it occurred. Second, they are likely to involve a great deal of 
experimenter bias (refer back to Chapter 3). Although no research design, 
including the randomized experimental designs, is immune to experi- 
menter bias, some, such as the case study, are at greater risk than others. 

The reason the case study is more at risk with respect to experimenter 
bias is that it involves considerably more interaction between the re- 
searcher and the participant than most other research methods. In addi- 
tion, the data in a case study come from the researcher's observations of 
the participant. Although this might also be supplemented by test scores 
and more objective measures, it is the researcher who brings all this to- 
gether in the form of a descriptive case study of the individual(s) in ques- 
tion. 

Finally, the small number of individuals examined in these studies 
makes it unlikely that the findings will generalize to other people with sim- 
ilar issues or problems. A case study of a single person diagnosed with a 
certain disorder is unlikely to be representative of all individuals with that 
disorder. Still, the overall contributions of the case study cannot be ig- 
nored. Regardless of its nonexperimental approach — in fact, because o£ its 
nonexperimental approach — it has substantially informed theory, re- 
search, and practice, serving to fulfill the first goal of science, which is to 
identify issues and causes that can then be experimentally assessed. 

Naturalistic Observation 

Naturalistic observation studies, as their name implies, involve observing or- 
ganisms in their natural settings. For example, a researcher who wants to 
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Putting It Into Practice 

A Refresher on Eliminating Experimenter Bias 

As discussed in Chapter 3, there are several effective strategies for reduc- 
ing or eliminating the effects of experimenter bias. The first strategy is to 
develop and employ highly specific study procedures. Using clearly opera- 
tionalized and standardized procedures can reduce the opportunity for 
bias to influence the way that study participants are treated and the way 
that data are considered or analyzed. A second strategy is to reduce or 
eliminate experimenter-participant interactions. For example, studies 
could be conducted via the Internet, or participants could receive study 
instructions and assessments via computer (Kazdin, 2003c). A third strat- 
egy is to keep the researcher unaware of participants' specific group as- 
signments, typically referred to as making the researcher blind or naive. 
Although this may be easiest in medication studies in which participants 
receive either a placebo or a real medication, it can (with a bit more ef- 
fort) be employed in other studies. For example, a study could use multi- 
ple researchers within sessions, so that those who deliver the interven- 
tions are aware of the group assignments and those who administer the 
dependent measure are not. 



examine the socialization skills of children may observe them while they 
are at a school playground, and then record all instances of effective or in- 
effective social behavior. The primary advantage of the naturalistic obser- 
vation approach is that it takes place in a natural setting, where the partic- 
ipants do not realize that they are being observed. Consequently, the 
behaviors that it measures and describes are likely to reflect the partici- 
pants' true behaviors. 

In general, naturalistic observation has four defining principals (Ray & 
Ravizza, 1988). The first and most fundamental principle is that of nonin- 
terference. Researchers who engage in naturalistic observation must not dis- 
rupt the natural course of events that they are observing. By adhering to 
this principle, researchers can observe events the way they truly happen. 
Second, naturalistic observation involves the observation and detection of 
invariants, or behavior patterns or other phenomena that exist in the real 
world. For example, individuals may be found to engage in similar ways, 
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on certain times or days, in certain contexts, or when in the company of 
certain people or groups. Third, the naturalistic observation approach is 
particularly useful for exploratory purposes, when we know little or noth- 
ing about a certain subject. In this vein, naturalistic observation can pro- 
vide a useful but global description of the participant and a series of events 
as opposed to isolated ones. Finally, the naturalistic observation method 
is basically descriptive. Although it can provide a somewhat detailed de- 
scription of a phenomenon, it cannot tell us why the phenomenon oc- 
curred. Determining causation is left to experimental designs, which were 
discussed in detail earlier in this chapter. 

The main limitation of the naturalistic approach is that the researcher 
has no real control over the setting. In the hypothetical study of children's 
socialization skills, factors other than a child's gender may be affecting the 
child's social behavior, but the researcher may not be aware of those other 
factors. In addition, participants may not have an opportunity to display 
the behaviors or phenomena the researcher is trying to observe because of 
factors that are beyond the researcher's control. For example, some of the 
children who are usually the most aggressive may not be at school that day 
or may instead be in detention because of previous misconduct, and thus 
they are not in the sample of children on the playground. A final limitation 
is that the topics of study are limited to overt behavior. A researcher can- 
not study unobservable processes like attitudes or thoughts using a natu- 
ralistic observation study. 

Survey Studies 

Survey studies ask large numbers of people questions about their behaviors, 
attitudes, and opinions. Some surveys merely describe what people say 
they think and do. Other survey studies attempt to find relationships be- 
tween the characteristics of the respondents and their reported behaviors 
and opinions. For example, a survey could examine whether there is a re- 
lationship between gender and people's attitudes about some social issue. 
When surveys are conducted to determine relationships, as for this second 
purpose, they are referred to as correlational studies. 
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Campbell and Katona (1953) delineated nine general steps for con- 
ducting a survey. Although this list is more than 50 years old, it is as useful 
now as it was then in providing a clear overview of survey procedures. The 
nine steps are as follows: 

1 . General objectives: This step involves defining the general purpose 
and goal of the survey. 

2. Specific objectives: This step involves developing more specificity 
regarding the types of data that will be collected, and specifying 
the hypothesis to be tested. 

3. Sample: The major foci of this step are to determine the specific 
population that will be surveyed, to decide on an appropriate 
sample, and to determine the criteria that will be used to select 
the sample. 

4. Questionnaire: The focus of this step is deciding how the sample 
is to be surveyed (e.g., by mail, by phone, in person) and devel- 
oping the specific questions that will be used. This is a particu- 
larly important step that involves determining the content and 
structure (e.g., open-ended, closed-ended, Likert scales; see 
Rapid Reference 5.6) of the questions, as well as the general for- 
mat of the survey instrument (e.g., scripted introduction, order 
of the questions). Importantly, the final survey should be sub- 
jected to a protocol analysis in which it is administered to nu- 
merous individuals to determine whether (a) it is clear and 
understandable and (b) the questions get at the type of 
information that they were designed to collect. For certain 
scales, such as Likert scales, you may also want to look for cer- 
tain response patterns to see whether there is a problematic re- 
sponse set that emerges, as indicated by restricted variability in 
responses (e.g., all items rated high, all items rated low, or all 
items falling in between). 

5. Fieldwork: This step involves making decisions about the indi- 
viduals who will actually administer the surveys, and about their 
qualifications, hiring, and training. 
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Measurement Modalities 

Three of the most common measurement modalities include open-ended 
questions, closed-ended questions, and Likert scales. An open-ended 
question does not provide the participant with a choice of answers. In- 
stead, participants are free to answer the question in any manner they 
choose. An example of an open-ended question is the following:"How 
would you describe your childhood?" By contrast, a closed-ended ques- 
tion provides the participant with several answers from which to choose. 
A common example of a closed-ended question is a multiple-choice 
question, such as the following: "How would you describe your childhood? 
(a) happy; (b) sad; (c) boring." Finally, a Likert scale asks participants to 
provide a response along a continuum of possible responses. Here's an 
example of a Likert scale:"My childhood was happy. (I) strongly agree; (2) 
agree; (3) neutral; (4) disagree; (5) strongly disagree." 



6. Content analysis: This involves transforming the often qualitative, 
open-ended survey responses into quantitative data. This may 
involve developing coding procedures, establishing the reliabil- 
ity of the coding procedures, and developing careful data screen- 
ing and cleaning procedures. 

7. Analysis plan: In general, these procedures are fairly straightfor- 
ward because the analysis of survey data is typically confined to 
descriptive and correlational statistics. Still, even survey studies 
should have clear statistical analysis plans. 

8. Tabulation: This step involves decisions about data entry. 

9. Analysis and reporting: As with all studies, the final steps are to 
conduct the data analyses, prepare a final report or manuscript, 
and disseminate the study's findings. 

Although a variety of methods for administering surveys are available, 
the most popular are face-to-face, telephone, and mail. In general, each of 
these methods has its own advantages and disadvantages. The major con- 
sideration for the researcher in deciding on the form of survey adminis- 
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tration is response rate versus cost. As a rule of thumb (Ray & Ravizza, 
1 988), if high rate of return is the main goal, then face-to-face or telephone 
surveys are the optimal choices, while mail surveys are the obvious choice 
when cost is an issue. 

The principal advantage of survey studies is that they provide informa- 
tion on large groups of people, with very little effort, and in a cost- 
effective manner. Surveys allow researchers to assess a wider variety of be- 
haviors and other phenomena than can be studied in a typical naturalistic 
observation study. 

Focus Groups 

Focus groups are formally organized, structured groups of individuals 
brought together to discuss a topic or series of topics during a specific pe- 
riod of time. Like surveys, focus groups can be an extremely useful tech- 
nique for obtaining individuals' impressions and concerns about certain 
issues, services, or products. 

Originally developed for use in marketing research, focus groups have 
served as a principal method of qualitative research among social scien- 
tists for many decades. In contrast to other, unilateral methods of obtain- 
ing qualitative data (e.g., observation, surveys), focus groups allow for in- 
teractions between the researcher and the participants and among the 
participants themselves. 

Like most other qualitative research methods, there is no one definitive 
way to design or conduct a focus group. However, they are typically com- 
posed of several participants (usually 6 to 10 individuals) and a trained 
moderator. Fewer than 6 participants may restrict the diversity of the opin- 
ions to be offered, and more than 10 may make it difficult for everyone 
to express their opinions comprehensively (Hoyle, Harris, & Judd, 2002). 
Focus groups are also typically made up of individuals who share a partic- 
ular characteristic, demographic, or interest that is relevant to the topic be- 
ing studied. For example, a marketing researcher may want to conduct a 
focus group with parents of young children to determine the desirability 
of a new educational product. Similarly, a criminal justice researcher inter- 
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ested in developing methods of reducing criminal recidivism may choose 
to conduct focus groups with recent parolees to discuss problems that 
they encountered after being released from prison. 

The presence of a trained moderator is critical to the focus-group pro- 
cess (Hoyle et al., 2002). The moderator is directly responsible for setting 
the ground rules, raising the discussion topics, and maintaining the focus 
of the group discussions. When setting the ground rules, the moderator 
must, above all, discuss issues of confidentiality, including the confiden- 
tiality of all information shared with and recorded by the researchers (also 
covered when obtaining informed consent). In addition, the moderator 
will often request that all participants respect each other's privacy by keep- 
ing what they hear in the focus groups confidential. Other ground rules 
may involve speaking one at a time and avoiding criticizing the expressed 
viewpoints of the other participants. 

Considerable preparation is necessary to make a focus group success- 
ful. The researcher must carefully consider the make-up of the group (of- 
ten a nonrepresentative sample of convenience), prepare a list of objec- 
tives and topics to be covered, and determine clear ground rules to be 
communicated to the group participants. When considering the questions 
and topics to be covered, the researcher should again take into account the 
make-up of the group (e.g., intelligence level, level of impairment) as well 
as the design of the questions. For example, when possible, moderators 
should avoid using closed-ended questions, which may not generate a 
great deal of useful dialogue. Similarly, moderators should avoid using 
"why" questions. Questions that begin with "why" may elicit socially ap- 
propriate rationalizations, best guesses, or other attributions about an in- 
dividual's behavior when the person is unsure or unaware of the true rea- 
sons or underlying motivations for his or her behavior (Nisbett & Wilson, 
1977). Instead, it may be more fruitful to ask participants about what they 
do and the detailed events surrounding their behaviors. This may ulti- 
mately shed more light on the actual precipitants of participants' behav- 
iors. Overall, focus groups should attempt to cover no more than two to 
three major topics and should last no more than 1 1/2 to 2 hours. 

The obvious advantage of a focus group is that it provides an open, 
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fairly unrestricted forum for individuals to discuss ideas and to clarify each 
others' impressions and opinions. The group format can also serve to 
crystallize the participants' opinions. However, focus groups also have 
several disadvantages. First, because of their relatively small sample sizes 
and the fact that they are typically not randomly selected, the information 
gleaned from focus groups may not be representative of the population 
in general. Second, although the group format may have some benefits in 
terms of helping to flesh out and distill perceptions and concerns, it is also 
very likely that an individual's opinions can be altered through group in- 
fluence. Finally, it is difficult to quantify the open-ended responses result- 
ing from focus group interactions. 

The information obtained from focus groups can provide useful in- 
sight into how various procedures, systems, or products are viewed, as well 
as the desires and concerns of a given population. For these reasons, focus 
groups, similar to other qualitative research methods, often form the start- 
ing point in generating hypotheses, developing questionnaires and sur- 
veys, and identifying the relevant issues that may be examined using more 
quantifiable research methodologies. 

SUMMARY 

In this chapter, we have provided a brief introduction to the three main 
classes of research design: experimental, quasi-experimental, and nonex- 
perimental/ qualitative. In addition to providing a general overview of 
these design types, we hope that we have given the reader a stronger ap- 
preciation for the subtleties of experimental design, and the ways that 
small variations can affect the researcher's ability to rule out alternative ex- 
planations and infer causation. We also hope to have conveyed an appro- 
priate respect for quasi- and nonexperimental designs. Although these de- 
signs do not provide researchers with the same amount of confidence in 
their conclusions, they are often necessary given the specific parameters of 
the topic under investigation or the inability to study a specific phenome- 
non in a true experimental fashion. Perhaps most important, these quasi- 
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and nonexperimental designs often provide the foundation, preliminary 
data, and conceptual framework from which scientifically testable hy- 
potheses are built. 



wgK TEST YOURSELF 3Qu 

1 . The most important element of a true experimental design is 

assignment. 

2. If groups are perfectly matched on all known factors, the researcher can 
be certain that any group differences on outcomes are due to the indepen- 
dent variable. True or False? 

3. In randomized two-group designs, participants are typically assigned by 
random selection to either an experimental or a group. 

4. Reversal or ABA designs cannot be used in all instances because some 
phenomena and behaviors are simply not reversible. True or False? 

5. A guided discussion to explore a group's opinions and impressions on a 
specific topic area is known as a . 

Answers: I . random; 2. False (It is still possible that any number of unknown variables may be 
responsible for the group differences.); 3. control; 4.True; 5. focus group 
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1 Malidity is an important term in research that refers to the conceptual 
\M and scientific soundness of a research study (Graziano & Raulin, 
r 2004). As previously discussed, the primary purpose of all forms of 
research is to produce valid conclusions. Furthermore, researchers are in- 
terested in explanations for the effects and interactions of variables as they 
occur across a wide variety of different settings. To truly understand these 
interactions requires special attention to the concept of validity, which 
highlights the need to eliminate or minimize the effects of extraneous in- 
fluences, variables, and explanations that might detract from a study's ul- 
timate findings. 

Validity is, therefore, a very important and useful concept in all forms of 
research methodology. Its primary purpose is to increase the accuracy and 
usefulness of findings by eliminating or controlling as many confounding 
variables as possible, which allows for greater confidence in the findings of 
a given study. There are four distinct types of validity (internal validity, ex- 
ternal validity, construct validity, and statistical conclusion validity) that in- 
teract to control for and minimize the impact of a wide variety of extrane- 
ous factors that can confound a study and reduce the accuracy of its 
conclusions. This chapter will discuss each type of validity, its associated 
threats, and its implications for research design and methodology. 

INTERNAL VALIDITY 

Internal validity refers to the ability of a research design to rule out or make 
implausible alternative explanations of the results, or plausible rival hy- 
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Internal Validity and Plausible Rival Hypotheses 

Internal validity: The ability of a research design to rule out or make 
implausible alternative explanations of the results, thus demonstrating that 
the independent variable was directly responsible forthe effect on the de- 
pendent variable and, ultimately, forthe results found in the study. 

Plausible rival hypotheses: An alternative interpretation of the re- 
searcher's hypothesis about the interaction of the independent and de- 
pendent variables that provides a reasonable explanation of the findings 
otherthanthe researcher's original hypothesis. 



potheses (Campbell, 1957; Kazdin, 2003c). A plausible rival hypothesis 'is an 
alternative interpretation of the researcher's hypothesis about the interac- 
tion of the independent and dependent variables that provides a reason- 
able explanation of the findings other than the researcher's original hypo- 
thesis (Rosnow & Rosenthal, 2002). 

Although evidence of absolute causation is rarely achieved, the goal of 
most experimental designs is to demonstrate that the independent variable 
was directly responsible for the effect on the dependent variable and, ulti- 
mately, the results found in the study. In other words, the researcher ulti- 
mately wants to know whether the observed effect or phenomenon is due 
to the manipulated independent variable or variables or to some uncon- 
trolled or unknown extraneous variable or variables (Pedhazur & 
Schmelkin, 1991). Ideally, at the conclusion of the study, the researcher 
would like to make a statement reflecting some level of causation between 
the independent and dependent variables. By designing strong experimen- 
tal controls into a study, internal validity is increased and rival hypotheses 
and extraneous influences are minimized. This allows the researcher to at- 
tribute the results of the study more confidently to the independent variable 
or variables (Kazdin 2003c; Rosnow & Rosenthal, 2002). Uncontrolled ex- 
traneous influences other than the independent variable that could explain 
the results of a study are referred to as threats to internal ' vah 
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Putting It Into Practice 

An Example of Internal Validity and Plausible 
Rival Hypotheses 

A researcher is interested in the effectiveness of two different parental 
skills training and education programs on improving symptoms of depres- 
sion in adolescents. The researcher recruits 1 00 families that meet speci- 
fied inclusion criteria in the study. The primary inclusion criterion is that 
the family must have an adolescent who currently meets criteria for de- 
pression. After recruitment, the researcher then randomly assigns the 
families into one of the two skills training programs. The parents receive 
the interventions over a 1 0-week period and are then sent home to apply 
the skills they have learned. The researcher reevaluates the adolescents 6 
months later to see whether there has been improvement in the adoles- 
cents' symptoms of depression. The results suggest that both groups im- 
proved. The researcher concludes that both parental skills training inter- 
ventions were effective for treating depression in adolescents. Given the 
limited information here, is this an appropriate conclusion? 

The answer of course, is no. This study has poor internal validity because 
it is impossible to say with any certainty that the independent variable 
(the two skills training classes) had an effect on the dependent variable 
(depression). There are a number of alternative rival hypotheses that have 
not been controlled for and could just as easily explain the results of the 
study. Many things could have transpired overthe course of the 6 months. 
For example, were certain adolescents placed on medication? Would 
they have improved without the intervention? Did their life circumstances 
change for the better? We will never know because the study has poor in- 
ternal validity and does not control for even the simplest and most obvi- 
ous alternative explanations. 

Threats to Internal Validity 

Although the terminology may vary, the most commonly encountered 
threats to internal validity are history, maturation, instrumentation, test- 
ing, statistical regression, selection biases, attrition, diffusion or imitation 
of treatment, and special treatment or reactions of controls (Christensen, 
1988; Cook & Campbell, 1979; Kazdin, 2003c; Pedhazur & Schmelkin, 
1991). Researchers must be aware that every methodological design is sub- 
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Threats to Internal Validity 

As discussed in Chapters 3 and 5, most threats to internal validity are 
controlled through statistical analyses, control and comparison groups, 
and randomization. The underlying assumption of randomization as it ap- 
plies to internal validity is that extraneous factors are evenly distributed 
across all groups within the study. Control groups allow for direct compar- 
ison between experimental groups and the evaluation of suspected extra- 
neous influences. Statistical controls are typically used when participants 
cannot be randomly assigned to experimental conditions, and involve sta- 
tistically controlling for variables that the researcher has identified as dif- 
fering between groups. 



ject to at least some of these potential threats and control for them ac- 
cordingly. Failure to implement appropriate controls affects the re- 
searcher's ability to infer causality. 

History 

Generally, history as a threat to internal validity refers to events or incidents 
that take place during the course of the study that might have an unin- 
tended and uncontrolled-for impact on the study's final outcome (or the 
dependent variable; Kazdin, 2003c). These events tend to be global 
enough that they affect all or most of the participants in a study They can 
occur inside or outside the study and typically occur between the pre- and 
postmeasurement phases of the dependent variable. The impact of history 
as a threat to internal validity is usually seen during the postmeasurement 
phase of the study and is particularly prevalent if the study is longitudinal 
and therefore takes place over a long period of time. Accordingly, the 
longer the period of time between the pre- and postmeasure, the greater 
the possibility that a history effect could have confounded the results of 
the study (Christensen, 1988). 

For example, an anxiety-provoking catastrophic national event could 
have an impact on many if not all participants in a study for the treatment 
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of anxiety. The event could produce an escalation in symptoms that might 
be interpreted as a failure of the intervention, when, in actuality, it is an 
artifact of the external event itself. Depending on the timing, this external 
event could have a significant impact on the measurement of the depen- 
dent variable. 

Another example can be found in our previous discussion of the effec- 
tiveness of parent skills training on adolescent symptoms of depression 
(see Putting It Into Practice on page 160). In that example, symptoms of 
depression were evaluated 6 months after the parental skills training inter- 
vention. It is possible that some other significant event occurred during 
that time period that might account for the reduced symptoms of depres- 
sion. One possibility is that school ended for the year and summer vaca- 
tion started, which produced a decrease in depressive symptoms among 
the sample of adolescents. So, the decrease in depression might be due to 
a historical artifact and not to the independent variable (i.e., the parent 
skills training intervention) . Historical events can also take place within 
the confines of the study, although this is less common. For example, an 
argument between two researchers that takes place in plain view of partic- 
ipants and is not part of the intended intervention is an event that can pro- 
duce a history effect. 

Maturation 

This threat to internal validity is similar to history in that it relates to 
changes over time. Unlike history, however, maturation refers to intrinsic 
changes within the participants that are usually related to the passage of time. 
The most commonly cited examples of this involve both biological and 
psychological changes, such as aging, learning, fatigue, and hunger (Chris- 
tensen, 1988). As with history, the presence of maturational changes oc- 
curs between the pre- and postmeasurement phases of the study and in- 
terferes with interpretations of causation regarding the independent and 
dependent variables. Historical and maturational threats tend to be found 
in combination in longitudinal studies. 

In our parent skills training example, might the symptoms of depres- 
sion have improved because the parents had an additional 6 months to 
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develop as parents, regardless of the skills training? Although it's unlikely, 
this is an alternative rival hypothesis that must be considered and con- 
trolled for, most likely through the inclusion of a control or comparison 
group that did not receive the parent skills training. 

Another example would be a study examining the effects of visualiza- 
tion on strength training in male adolescents over a specified period of 
time. As adolescent males mature naturally, we would expect to see incre- 
mental increases in strength regardless of the visualization intervention. 
So, a causal statement regarding the effects of visualization on strength in 
adolescent males would have to be qualified in the context of the matura- 
tional threat to internal validity. Again, this threat could be minimized 
through the use of control or comparison groups. 



Instrumentation 

This threat to internal validity is unrelated to participant characteristics and 
refers to changes in the assessment of the independent variable, which are usu- 
ally related to changes in the mea- 



sunnginstrument or measurement 
procedures over time (Chris- 
tensen, 1988; Kazdin, 2003c). In 
essence, instrumentation compro- 
mises internal validity when 
changes in the dependent variable 
result from changes over time in 
the assessment instruments and 
scoring criteria used in the study. 
There is a wide variety of measure- 
ment and assessment techniques 
available to researchers, and some 
of these are more susceptible to in- 
strumentation effects than others. 
The susceptibility of a measure to 
instrumentation bias is usually a 
function of standardization. 



DOK'T FORGET 

Important Considerations 

Regarding 

Instrumentation 

• Standardization refers to the 
guidelines established in the ad- 
ministration and scoring of an 
instrument or other assessment 
method. 

• Reliability is present when an as- 
sessment method measures the 
characteristics of interest in a 
consistent fashion. 

• Validity is present when the ap- 
proach to measurement used in 
the study actually measures 
what it is supposed to measure. 
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Ration refers to the guidelines established in the administration 
and scoring of an instrument or other assessment method, and also en- 
compasses the psychometric concepts of reliability and validity. An ap- 
proach to measurement is reliable lilt assesses the characteristics of inter- 
est in a consistent fashion. Validity refers to whether the approach to 
measurement used in the study actually measures what it is supposed to 
measure. Instruments that are standardized and psychometrically sound 
are least susceptible to instrumentation effects, while other types of as- 
sessment methods (e.g., independent raters, clinical impressions, "home- 
made" instruments) dramatically increase the possibility of instrumenta- 
tion effects. 

For example, a researcher could use a number of measurement ap- 
proaches in a treatment study of depression. The researcher could use, for 
example, a standardized measure to assess symptoms of depression, such 
as the Beck Depression Inventory (BDI), which is a self-report, paper- 
and-pencil test known for its reliability and validity (Beck et al., 1961). The 
BDI is also standardized in that respondents are all exposed to the same 
stimuli, which is a set of questions related to symptoms of depression. 
This high level of standardization in administration and scoring makes it 
unlikely that instrumentation effects would be present. In other words, 
unless the researchers altered the items of the BDI, modified the adminis- 
tration procedures, or switched to a different version of the instrument 
midway through the study, we would not expect instrumentation to be a 
significant threat to the internal validity of the study. 

Conversely, other approaches to measurement are more susceptible to 
possible instrumentation effects. There are many different ways to mea- 
sure the construct of depression. Let's assume that the BDI was unavail- 
able, so the researcher had to rely on some other method for assessing the 
impact of treatment on symptoms of depression. A common solution to 
this problem might be to have independent raters assess the level of symp- 
toms based on clinical diagnostic criteria and then assess the participants 
over the course of the intervention. This type of approach to measure- 
ment, if poorly implemented, dramatically increases the likelihood of in- 
strumentation effects. 
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Instrumentation Effects 



The primary concern is that the 
raters might have different stan- 
dards for what qualifies as meet- 
ing the criteria for symptoms of 
depression. Let's assume that rater Instrumentation effects are least 
A requires signifkandy more im- prevalent when using standard- 

. ' . . . ized, psychometrically sound im- 

pairment in functioning from a , , ., 

r & struments to measure the van- 

participant before acknowledging ab | es f interest. When such 

that depression or depressive measures are not available, the 

symptoms are actually present. likelihood of instrumentation ef- 

_ . , . . fects rises dramatically. In such 

furthermore, the rater standards . . r 

' cases, ongoing training of raters 

for identifying the symptoms and an d interrater reliability checks are 
making the diagnosis of depres- an absolute necessity. 
sion might fluctuate significantly 

over time, which adds yet another layer of difficulty when the researcher 
attempts to interpret the impact of treatment (the independent variable) 
on depression (the dependent variable). Without standardization, there is 
a significant likelihood that any changes in the dependent variable over the 
course of treatment might be the result of changes in scoring criteria and 
not the intervention itself. These issues are usually addressed through on- 
going training and frequent interrater reliability checks (a statistical method 
for determining the level of consistency and agreement between different 
raters). 



Testing 

This threat to internal validity refers to the effects that taking a test on one 
occasion may have on subsequent administrations of the same test 
(Kazdin, 2003c). In essence, when participants in a study are measured 
several times on the same variable (e.g., with the same instrument or test), 
their performance might be affected by factors such as practice, memory, 
sensitization, and participant and researcher expectancies (Pedhazur & 
Schmelkin, 1991). This threat to internal validity is most often encoun- 
tered in longitudinal research where participants are repeatedly measured 
on the same variables over time. The ultimate concern with this threat to 
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internal validity is that the results of the study might be related to the re- 
peated testing or evaluation and not the independent variable itself. 

For example, let's consider a hypothetical study designed to assess the 
impact of guided imagery techniques on the retention of a series of ran- 
dom symbols. First, each participant is exposed to the random symbols 
and then asked to reproduce as many as possible from memory after a 1 5- 
minute delay. This serves as a pretest or baseline measure of memory per- 
formance. Next, participants are exposed to the intervention, which is a 
series of guided imagery techniques that the researchers believe will im- 
prove retention of the symbols. The researchers believe that recall of the 
symbols will increase as participants learn each of six imagery techniques, 
with the highest level of recall coming after participants have learned all of 
the imagery techniques. In this case, the guided imagery technique is the 
intervention or independent variable, and the recall of the random sym- 
bols is the dependent variable. The participants are exposed to six learn- 
ing trials. During each trial, the participant is taught a new imagery tech- 
nique, exposed to the same random symbol stimuli, and then asked to 
reproduce as many as possible after a 15-minute delay. Ideally, the partici- 
pants are using their imagery techniques to aid in retention of the symbols. 
Keep in mind here that the participants are being tested on the same set of 
symbols on six different occasions, and that the symbol set in this example 
is the testing instrument and outcome measure. The researchers run their 
trials and confirm their hypotheses. The participants perform above base- 
line expectations after the first trial and their performance improves con- 
sistently as they are exposed to additional imagery techniques. The best 
performance is seen after the final imagery technique is implemented. 

Can it be said that the imagery techniques are the cause of the improved 
retention of the random symbols? The researchers could make that asser- 
tion, but the presence of a testing effect seriously undermines the credi- 
bility of their results. Remember that the participants are exposed to the 
same test or outcome — the random symbols — on at least seven different 
occasions. This introduces a strong plausible rival hypothesis that the im- 
provement in retention is simply due to ^practice effect, or the repeated ex- 
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posure to the same stimuli. As the researchers did not account for this pos- 
sibility with a control group or by varying the content of the symbol stim- 
ulus, this remains a legitimate explanation for the findings. In other words, 
the practice effect provides a plausible alternative hypothesis. 

Statistical Regression 

This threat to internal validity refers to a statistical phenomenon whereby 
extremely high or low scores on a measure tend to revert toward the arith- 
metic mean or average of the distribution with repeated testing (Chris- 
tensen, 1988; Kazdin, 2003c; Neale & Liebert, 1973). 

For example, let's assume that we obtained the following array of scores 
on our symbol retention measure from the preceding example: 5, 12, 18, 
19, 27, 42, 55, and 62. The mean for this set of scores is 30 (240 ■*■ 8 = 30). 
On average, the participants in the study recalled 30 random symbols 
when assessed for retention. Generally, statistical regression suggests that 
over time and repeated administration of the memory assessment, we 
would expect the scores in this array to revert closer to the mean score of 
30. This is particularly true of extreme scores that lie far outside the nor- 
mal range of a distribution. These extreme scores are also known as outliers. 
In a distribution of scores with a mean of 30, it would be reasonable to 
identify, at a minimum, the scores of 5 and 62 as outliers. So, on our next 
administration of the memory test, we would expect all of these scores to 
revert closer to the mean, regardless of the effect of the intervention (or indepen- 
dent variable). In addition, we would probably see the largest movement 
toward the mean in the more extreme scores. 

This phenomenon is particu- 
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lustrate this point. A study is designed to assess the impact of a new, 10- 
week treatment for anxiety. The researchers are interested in the effects of 
their new treatment on low, medium, and high anxiety levels as deter- 
mined by a score on a standardized measure of anxiety. The researchers 
hope that their new treatment will reduce symptoms of anxiety across 
each of the three conditions. Accordingly, each participant is administered 
the anxiety measure as a pretest to determine his or her current anxiety 
level and then is assigned to one of three groups — low, medium, or high 
anxiety — on the basis of predetermined cutoff scores. For the sake of clar- 
ity, let's assume the mean anxiety level for the entire sample was 30, the 
mean for the low- anxiety group was 12, the mean for the medium -anxiety 
group was 29, and the mean for the high-anxiety group was 42. 

Each of these groups then receives ongoing treatment and assessment 
over the 10-week protocol. The results of the study suggest that anxiety 
scores increased in the low- anxiety condition, stayed roughly the same in 
the medium-anxiety condition, and decreased in the high-anxiety condi- 
tion. Our somewhat befuddled researchers conclude that their treatment 
is effective only for cases of severe anxiety, exacerbates symptoms in indi- 
viduals with minimal symptoms of anxiety, and has little to no effect on 
moderate levels of anxiety. Although these findings might be accurate, it is 
also possible that they are the result of statistical regression. The scores in 
the high-anxiety group might have reverted to the overall group mean over 
the 10 weeks, giving the impression that symptom reduction resulted from 
the intervention. Similarly, the perceived increase in symptoms in the low- 
anxiety group might be the result of those low scores' moving toward the 
overall group mean. In other words, the mean scores for both of these 
groups included extreme scores, or outliers, which were then influenced 
by regression to the mean. It is therefore possible that we would have seen 
the same results even without the impact of the independent variable. 
Note that the medium-anxiety group did not change and that this was the 
group whose mean score was closest to the overall sample mean, which 
makes it least susceptible to the effects of statistical regression. This could 
account for the possibly erroneous conclusion that the treatment proto- 
col was ineffective on moderate symptoms of anxiety. 
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Selection Biases 

This threat to internal validity refers to systematic differences in the as- 
signment of participants to experimental conditions. As noted in Chapter 
5, selection biases are prevalent in quasi-experimental research in which 
participants are assigned to experimental conditions or comparison 
groups in a nonrandom fashion (Christensen, 1988; Kazdin, 2003c; Ros- 
now & Rosenthal, 2002) . Remember, randomization is designed to control 
for systematic participant differences across experimental and control 
groups. In essence, randomization evenly distributes and equates groups 
on any potential confounding variables. Without randomization, it is more 
difficult to account and control for these systematic variations in partici- 
pant characteristics. As with all threats to internal validity, selection bias 
can have a negative impact on the researcher's ability to draw causal infer- 
ences about the effects of the independent variable. 

As mentioned previously, selection biases are common in quasi- 
experimental research in which randomization cannot be accomplished. 
The most common example of this is when the experimenter attempts to 
conduct research in a setting or under a set of circumstances where the 
groups are already formed and cannot be altered. In other words, for 
whatever reason, randomization is not feasible or possible. 

For example, let's consider a design to test the effectiveness of a classroom 
intervention to improve mathematics skills in two classes of third graders. 
Because the students are already 
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If Class 1 performs better, is it safe to conclude that the intervention, 
or independent variable, is responsible for the improvement? Although it 
is possible, there are a number of plausible rival hypotheses that have not 
been controlled for. Most of these hypotheses revolve around preexisting 
differences between the two groups (i.e., before the intervention was de- 
livered). For example, it is possible that the students in Class 1 are more 
motivated or mature than their counterparts in Class 2. In fact, any preex- 
isting difference between the compositions of the two groups is a threat to 
internal validity. Any of these differences might provide a valid explana- 
tion for the results of the math intervention. 

Attrition 

This threat to internal validity refers to the differential and systematic loss 
of participants from experimental and control groups. In essence, partic- 
ipants drop out of the study in a systematic and nonrandom way that can 
affect the original composition of groups formed for the purposes of the 
study (Beutler & Martin, 1999). The potential net result of attrition is that 
the effects of the independent variable might be due to the loss of partic- 
ipants and not to the manipulation of the independent variable. 

Commentators have noted that this threat to internal validity is com- 
mon in longitudinal research and is a direct function of time (Kazdin, 
2003c; Phillips, 1985). In general, attrition rates average between 40 and 
60% in longitudinal intervention research, with most participants drop- 
ping out during the earliest stages of the study (Kazdin). Attrition applies 
to most forms of group and single-case designs and can be a threat to in- 
ternal validity even after the researcher has randomly assigned participants 
to experimental and control groups. This is because attrition occurs as the 
study progresses and after participants have been assigned to each of the 
conditions. Attrition raises the possibility that the groups differ on certain 
characteristics that were originally controlled for through randomization. 
In other words, the remaining participants no longer represent the origi- 
nal sample and the groups might no longer be equivalent. 

Let's consider an example. A researcher decides to conduct a study of 
the effectiveness of a new drug on symptoms of anxiety. Randomization 
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is used to assign participants to either a medication (i.e., experimental) 
group or placebo (i.e., control) group. Let's assume that over the course of 
the study, participants in the experimental group experience some rela- 
tively severe side effects from the medication and an increase in anxiety, 
causing some to drop out of the study. The placebo group does not expe- 
rience the side effects, so the dropout rate is lower in that group. The av- 
erage anxiety levels of the two groups are compared at the conclusion of 
the study, and the results suggest that the participants in the medication 
group are less anxious than those in the placebo group. The results seem 
to support the conclusion that the medication was effective for the treat- 
ment of anxiety. The problem with this conclusion is that the results are 
potentially confounded by attrition. If no study participants had dropped 
out of the medication group, it is likely that the results would have been 
different. In this example, notice that attrition was still a factor after ran- 
domization and that the final sample was probably very different from the 
original sample used to form the experimental and control groups. 

Diffusion or Imitation of Treatment 

This threat to internal validity is common in various forms of medical and 
psychotherapy treatment effectiveness research, and it manifests itself in 
two distinct but related sets of circumstances. 

The first set of circumstances is the unintended exposure of a control 
group to the actual or similar intervention (independent variable) in- 
tended only for the experimental condition (Kazdin, 2003c; Pedhazur & 
Schmelkin, 1991). Let's consider a study examining the relative benefits of 
exercise and nutritional counseling on weight loss. The researchers hy- 
pothesize that exercise is more effective than nutritional counseling and 
assign participants to an exercise, nutritional counseling, or no- 
intervention control group. The experimental group receives a cus- 
tomized exercise regimen, the nutritional group receives general nutri- 
tional counseling, and the control group is simply monitored for weight 
loss or gain for the same time period. 

During the course of the study, a well-intentioned, but misguided, nu- 
tritional counselor extols the benefits of exercise to the members of the 
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nutritional counseling group. This additional counseling was not part of 
the original design and the researchers are unaware that it is taking place. 
Although the nutritional counseling group is not receiving the actual ex- 
ercise intervention, the discussion of exercise with this group might have 
an unintended and uncontrolled-for effect. For example, this knowledge 
might encourage participants in the nutritional group to seek out their 
own exercise program or to change their day-to-day habits in such a way 
that increases their general activity level, such as taking the stairs instead 
of the elevator. If that is indeed the case, then the nutritional group has re- 
ceived a similar intervention as the experimental group. At a minimum, the 
results could be confounded because the nutritional condition is not be- 
ing delivered as the researchers had originally intended, because the exer- 
cise condition has diffused into the nutritional group. The threat to inter- 
nal validity in this example lies in the possibility that the exercise and 
nutritional groups have now received similar interventions, which might 
equalize performance across the groups (Kazdin, 2003c). 

The second set of circumstances arises when the experimental group 
does not receive the intended intervention at all (Kazdin, 2003c; Pedhazur 
& Schmelkin, 1991). In the first case, participants in a control group either 
gain knowledge about or are unintentionally exposed to the experimental 
intervention (the independent variable). In this case, the researcher be- 
lieves that the experimental group has received the intervention when, in 
reality, it has not. This is a common threat in many forms of psychotherapy 

research. Take, for example, a 

study comparing the effectiveness 
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treatments and the results suggest that they are both equally effective. 
What the researchers do not know is that the behavioral therapist has ei- 
ther intentionally or unintentionally strayed from the specified protocol at 
times and included elements of the psychodynamic treatment in the be- 
havioral condition. In other words, the behavioral group might not have 
received a behavioral intervention at all. At best, they have received a hy- 
brid of psychodynamic and behavioral treatment. As in our previous ex- 
ample, rather than comparing two distinct conditions, the researchers 
might be comparing two conditions that are more similar than intended by 
the original research design. Again, this might equalize the performance of 
the experimental and control groups, which could have the effect of dis- 
torting or clouding the results of the study. 

Special Treatment or Reactions of Controls 

These relatively common threats to internal validity may be caused by the 
special, often compensatory, treatment or attention given to the control 
group. Even in the absence of special attention or treatment, controls may 
realize that they are in a "lesser" condition and react by competing or oth- 
erwise improving their performance. Either of these situations can equal- 
ize the performance of the experimental and control conditions and 
thereby "washout" between-group differences on the dependent variable 
(Christensen, 1988; Kazdin, 2003c; Pedhazur & Schmelkin, 1991). Special 
treatment itself is a relatively common threat to internal validity and can 
be related to any number of activities conducted with the control (nonin- 
tervention) group. Remember that in this case, the intervention is also the 
independent variable. These factors range from simple human interaction 
to more concrete examples such as financial compensation or special priv- 
ileges. For example, attention alone might produce an unintended change 
in behavior. 

Let's assume that there are two groups in a study of depression. The in- 
tervention or experimental group receives therapy while the control group 
is simply monitored weekly for symptom severity. The monitoring con- 
sists of an hour-long structured interview with a research assistant. This 
weekly social attention might act as an intervention despite the fact that it 
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was intended for monitoring purposes only. Perhaps the interview gives 
the control participants the opportunity to discuss their symptoms, which 
produces some symptom relief even without therapy per se. After all, so- 
cial support has been linked to positive outcomes for depression. The 
same effect might be observed even in the absence of human contact. For 
example, just filling out a self- report measure of depressive symptoms in 
an empty room might have the same effect by raising the awareness of the 
control participants in regard to their current symptom level. Reinforcers 
and other incentives might have a similar effect. Giving the control par- 
ticipants money or special privileges might have an impact on levels of de- 
pression by raising self-esteem or reducing hopelessness. Like diffusion or 
imitation of treatment, this threat to internal validity might equalize the 
performance of the experimental and control groups, which could have 
the effect of distorting or clouding the results of the study. 

In conclusion, threats to the internal validity of a study (summarized in 
Rapid Reference 6.1) are common and, at times, unavoidable. They can oc- 
cur alone or in combination, and they can create unwanted plausible alter- 
native hypotheses for the results of a study. These rival hypotheses may 
make it difficult to determine causation. Some of these threats can be han- 
dled effectively through design components (e.g., control groups and ran- 
domization) at the outset of the study, while others (e.g., attrition) take 
place during the course of the study. Accounting for these threats is a crit- 
ical aspect and function of research methodology that should take place, 
if possible, at the design stage of the study. Refer to Chapter 3 for a gen- 
eral discussion of these strategies. 

EXTERNAL VALIDITY 

External validity is concerned with the generalizability of the results of a re- 
search study. In all forms of research design, the results and conclusions of 
the study are limited to the participants and conditions as defined by the 
contours of the study. External validity (compare to ecological 'validity tin Rapid 
Reference 6.2) refers to the degree to which research results generalize to 
other conditions, participants, times, and places (Graziano & Raulin, 2004). 
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Threats to Internal Validity 

History: Global internal or external events or incidents that take 
place during the course of the study that might have unintended and 
uncontrolled-for impacts on the study's final outcome (i.e., on the de- 
pendent variable). 

Maturation: Intrinsic changes within the participants that are usually 
related to the passage of time. 

Instrumentation: Changes in the assessment of the independent 
variable that are usually related to changes in the measuring instrument 
or measurement procedures overtime. 

Testing: The effects that taking a test on one occasion may have on 
subsequent administrations of the test. It is most often encountered in 
longitudinal research, in which participants are repeatedly measured on 
the same variables of interest overtime. 

Statistical regression: Statistical phenomenon, prevalent in pretest 
and posttest designs, in which extremely high or low scores on a mea- 
sure tend to revert toward the mean of the distribution with repeated 
testing. 

Selection bias: Systematic differences in the assignment of partici- 
pants to experimental conditions. 

Attrition: Loss of research participants that may alter the original 
composition of groups and compromise the validity of the study. 

Diffusion or imitation of treatment: Unintended exposure of a 
control group to an intervention intended only for the experimental 
group, or a failure to expose the experimental group to the intended 
intervention. This confound most commonly occurs in medical and psy- 
chological intervention studies. 

Special treatment or reactions of controls: Relatively common 
threats to internal validity in which either ( I ) special or compensatory 
treatment or attention is given to the control condition, or (2) partici- 
pants in the control condition, as a result of their assignment, react or 
compensate in a manner that improves or otherwise alters their per- 
formance. 
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Ecological and Temporal Validity 

Although the terms "ecological validity" and "external validity" are some- 
times used interchangeably, a clear distinction can be drawn between the 
two. Of the two, external validity is a more general concept. It refers to the 
degree to which research results generalize to other conditions, partici- 
pants, times, and places, and it is ultimately concerned with the conclu- 
sions that can be drawn about the strength of the inferred causal relation- 
ship between the independent and dependent variables to circumstances 
beyond those experimentally studied. Ecological validity is a more specific 
concept that refers to the generalization of findings obtained in a labora- 
tory setting to the real world. 

Temporal validity is anotherterm that is related broadly to external validity. 
It refers to the extent to which the results of a study can be generalized 
across time. More specifically this type of validity refers to the effects of 
seasonal, cyclical, and person-specific fluctuations that can affect the gen- 
eralizability of the study's findings. 



Therefore, a study has more external validity when the results generalize 
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effectiveness of a new intervention for test anxiety. Again, the intervention 
is the independent variable, while test anxiety is the dependent variable. 
The study is being conducted at a major East Coast university, and the par- 
ticipants are college freshmen currently taking an introductory-level psy- 
chology class. Although this might not seem realistic at first glance, many 
studies are conducted with college students because they are easily acces- 
sible and form samples of convenience (Kazdin, 2003c). Students are as- 
sessed to determine their levels of test anxiety and then are assigned to ei- 
ther a no-treatment control group or an experimental group that receives 
the intervention. The new therapy is remarkably effective and significantly 
reduces test anxiety in the experimental group. The researchers immedi- 
ately market their intervention as being a generally effective treatment for 
test anxiety. Can the researchers support their claim based on the results of 
their study? Hopefully, you have already realized that this study has serious 
flaws related to internal validity, but let's put that aside for the purposes of 
this example and focus only on issues surrounding external validity. 

Remember that external validity is the degree to which research results 
generalize to other conditions, participants, times, and places. A study has 
external validity when the results generalize to other populations, settings, 
and circumstances. In our example, the researchers have found that their 
intervention effectively reduces test anxiety, and they are assuming that it 
is effective across a wide variety of settings and populations. They might 
be correct, but the design of this study does not have strong external va- 
lidity for a number of reasons, which undermines the assertion that the in- 
tervention is effective for other populations. 

First, the study was conducted with a sample of college freshmen en- 
rolled in an introductory-level psychology course. This is a very narrow 
sample; would the results apply to broader populations, such as elemen- 
tary school children, high school students, or college seniors? Would the 
results apply to college freshmen who were not enrolled in an introductory- 
level psychology class? We do not know for certain because these individ- 
uals were not included in the sample used in the study. 

Second, do the results apply to other settings, such as different univer- 
sities, high schools, classes, and business environments? The effectiveness 
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of the intervention might be limited to the setting where the study was 
conducted. For example, we might find that the results do not generalize 
to universities on the West Coast or to high schools. In other words, the 
effectiveness of the intervention might be specific to the population rep- 
resented by the sample used in the study. 

Third, is there something unique about the conditions of the study? For 
example, was the study conducted around midterm or final exams, when 
anxiety levels might be unusually high? Would the intervention have been 
as effective if the study had occurred at a different time during the semes- 
ter? As mentioned previously, the answer is that we do not know for sure. 
In terms of external validity, the most accurate statement that can be made 
from the results of our hypothetical study is that the intervention was ef- 
fective for college freshmen in introductory-level psychology classes at a 
major East Coast university. Any other conclusions would not necessarily 
be supported, and additional research across different times, places, and 
conditions would be necessary to support any other conclusions. 



Threats to External Validity 

As with internal validity, there are confounds and characteristics of a study 
that can limit the generalizability of the results. These characteristics and 
confounds are collectively referred to as threats to external validity, and they 
include sample characteristics, stimulus characteristics and settings, reac- 
tivity of experimental arrangements, multiple-treatment interference, 
novelty effects, reactivity of assessment, test sensitization, and timing of 
measurement (Kazdin, 2003c). Controlling these influences allows the re- 
searchers to more confidently generalize the results of the study to other 
circumstances and populations (Kazdin; Rosnow & Rosenthal, 2002). 

Sample Characteristics 

This threat to external validity refers to a phenomenon whereby the results 
of a study apply only to a particular sample. Accordingly, it is unclear whether 
the results can be applied to other samples that vary on characteristics such 
as age, gender, education, and socioeconomic status (Kazdin, 2003c). 
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An example of sample characteristics can be found in our earlier dis- 
cussion about external validity. In that example, we noted that the sample 
consisted of college freshmen enrolled in an introductory-level psychol- 
ogy class. As we noted, we cannot assume that the findings of that study 
would necessarily hold true for a different sample, such as high school stu- 
dents or elementary school children. In addition, we cannot even assume 
that the findings would hold true for college freshmen generally. Through 
further research, we might discover that the intervention was effectively 
only for psychology students and did not generalize to freshmen taking 
introductory-level business or science classes. In other words, even this 
subtle difference in sample characteristics can have a significant effect on 
the generalizability of a study's results. Clearly, it would not be possible or 
practical to include every possible population characteristic in our sample, 
so we are always faced with the possibility that sample characteristics are a 
confound to the external validity of any study. Accordingly, conclusions 
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Diversity Characteristics 

Sample characteristics can encompass a wide variety of traits and demo- 
graphic characteristics, with some of the most common being age, gender; 
education, and socioeconomic status. Commentators have noted that 
some diversity-related characteristics are not well represented in most 
forms of research (Kazdin, 2003c). The primary concern in this area is that 
there is an overrepresentation of some groups, such as college students; 
and a related, limited inclusion of underrepresented and minority groups, 
such as Hispanic Americans and women. Diversity characteristics are an 
important issue in terms of external validity, and they can have important 
and far-reaching consequences for all strata of society. For example, the 
results of a medication effectiveness study conducted only on White 
males might not hold true for a different racial group. The possible ramifi- 
cations should be obvious. Similarly, a study designed to provide informa- 
tion needed to make an important public policy decision should include a 
sample diverse enough to accurately capture the particular group that will 
be directly impacted by the decision. Although these are only two ex- 
amples, diversity factors should be considered in all types of research. 
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drawn from the results of a study tend to be limited to the characteristics 
represented by the sample used in the study. 

Stimulus Characteristics and Settings 

This threat to external validity refers to an environmental phenomenon in 
which particular features or conditions of the study limit the generaliz- 
ability of the findings (Brunswik, 1955; Pedhazur & Schmelkin, 1991). 
Every study operates under a unique set of conditions and circumstances 
related to the experimental arrangement. The most commonly cited ex- 
amples include the research setting and the researchers involved in the 
study. The major concern with this threat to external validity is that the 
findings from one study are influenced by a set of unique conditions, and 
thus may not necessarily generalize to another study, even if the other 
study uses a similar sample. 

Let's return again to our previous example involving the intervention 
for test anxiety. That study found that the intervention was effective for 
test anxiety with college freshmen enrolled in an introductory-level psy- 
chology class at a major East Coast university. A colleague at a West Coast 
university decides to replicate the study using a sample of college fresh- 
men enrolled in an introductory-level psychology class. Despite following 
our East Coast procedures to the letter, our colleague does not find that 
the intervention was effective. Although there could be a number of 
explanations for this, it is possible that a stimulus-characteristics-and- 
settings confound is present. The setting where the intervention is deliv- 
ered is no doubt different at our West Coast colleague's university — for 
example, it could be less comfortable than our East Coast setting. Simi- 
larly, a different individual is delivering the intervention to the college 
freshmen on the West Coast, and this individual might be less competent 
or less approachable than his or her East Coast counterpart. Each of these 
is an example of potential sources of stimulus characteristics and settings. 

Reactivity of the Experimental Arrangements 

This threat to external validity refers to a potentially confounding variable 
that is a result of the influence produced by knowing that one is partici- 
pating in a research study (Christensen, 1988). In other words, the partic- 
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ipants' awareness that they are taking part in a study can have an impact on 
their attitudes and behavior during the course of the study This, in turn, 
can have a significant impact on any results obtained from the study and is 
especially problematic when participants know the purpose or hypotheses 
of the study We discussed strategies for limiting participants' knowledge 
about a study's hypotheses in Chapter 3. As a threat to external validity, the 
issue becomes whether the same results would have been obtained had the 
participants been unaware that they were being studied (Kazdin, 2003c). 
This threat to external validity is a very common one. The primary reason 
for this is that ethical standards require that participants provide informed 
consent before participating in most research studies. 

For example, let's consider a study designed to evaluate the effective- 
ness of a 10-week behavior modification program devised to reduce re- 
cidivism in adolescent offenders. The experimental group receives the 
intervention (i.e., the independent variable) and the control group does not. 
The researchers find that the experimental group shows lower levels of 
recidivism (i.e., the dependent variable) when compared to the control 
group. The researchers might be tempted to say that the intervention was 
responsible for the findings; however, it might be that the behavior in 
question improved because the participants had assumed a compliant at- 
titude toward the intervention. Alternatively, if the participants in the 
treatment group had adopted a more negativistic attitude toward the inter- 
vention, the results of the study might have suggested that the interven- 
tion was not successful. In any event, either outcome might be the result 
of reactivity to the experimental arrangements and not the interven- 
tion itself. 

Multiple-Treatment Interference 

This threat to external validity refers to research situations in which (1) 
participants are administered more than one experimental intervention 
(or independent variable) within the same study or (2) the same individu- 
als participate in more than one study (Pedhazur & Schmelkin, 1991). Al- 
though it is most common in treatment-outcome studies, it is also preva- 
lent in any study that has more than one experimental condition or 
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independent variable. The major implication of this threat is that the re- 
search results may be due to the context or series of conditions in which 
the research presented (Kazdin, 2003c). 

In the first research situation, independent variables administered si- 
multaneously or sequentially may produce an interaction effect. In gen- 
eral, multiple independent variables administered in the same study act as 
a confound that makes it difficult to determine which one is responsible 
for the observed results. The second situation refers to the relative expe- 
rience and sophistication of the participants. Familiarity with research can 
affect the behavior and responses of participants, which again makes it dif- 
ficult to accurately interpret the results of the study. 

For example, let's consider a common situation in which multiple- 
treatment interference can occur. A 12-week treatment study is designed 
to assess the effectiveness of a combined approach to treating depression 
that encompasses elements of both psychodynamic and cognitive therapy. 
The participants are randomly divided into a control group and an experi- 
mental group. Both groups are assessed to determine symptom severity. 
The experimental group then receives 6 weeks of psychodynamic therapy 
followed by 6 weeks of cognitive therapy. At the end of 12 weeks, both the 
control and experimental groups are reassessed for symptom severity. The 
results of the assessment suggest that the experimental group experienced 
significant symptom reduction while the control group did not. The re- 
searchers conclude that a combined psychodynamic— cognitive therapy 
model is an effective approach to treating depression. 

Although this may indeed be the case, it is far from a certainty and there 
are many unanswered questions. For example, would the treatment have 
been as effective if the cognitive therapy had been administered first? 
Would 6 weeks of psychodynamic or cognitive therapy alone have pro- 
duced similar results? Did the presence of both treatment modalities ac- 
tually reduce the effectiveness of the overall intervention? Although the 
study produced significant symptom improvements, it might have pro- 
duced even better results if both forms of therapy had not been used. 
These are aspects of multiple-treatment effects that are best controlled for 
through specific research designs that were discussed in Chapter 5. 
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the intervention or situation acts as a confounding variable, and it is that 
novelty (and not the independent variable) that is the real explanation for 
the results. This threat to external validity is common across a wide vari- 
ety of settings and experimental designs. 

Take, for example, a situation in which researchers are trying to deter- 
mine the effectiveness of a new therapy intervention for individuals with 
a history of chronic depression. They have decided to call this new inter- 
vention "smile therapy" because the therapist is trained to smile at the 
client on a regular schedule in the hope of encouraging a positive mood 
and outlook on life. Symptoms of depression are assessed, and then the 
participants are randomly assigned to either a control group or one of 
three experimental conditions. The three experimental conditions include 
smile therapy, cognitive-behavioral therapy, and interpersonal therapy. All 
of the participants undergo their respective treatments for 4 weeks and are 
then reassessed for severity of depression. The researchers find that smile 
therapy is more effective than both cognitive-behavioral and interpersonal 
therapy on symptoms of chronic depression. 

By now, you have likely figured out that there might be a problem here 
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because a novelty effect could also account for the results. Our population 
in this fictitious study consists of individuals with chronic depression, so 
it is likely that they have tried many treatment modalities or at least been 
in treatment in one modality for a significant period of time. Although 
these modalities are somewhat distinct, none of them involves the thera- 
pist smiling at the participant as the intervention. The smile therapy is 
therefore unique, or novel, and this alone might account for the improve- 
ments in depression. The other issue here is that the intervention took 
place over the course of 4 weeks. If these findings were the result of a nov- 
elty, then we would expect the treatment effect to disappear over time as 
the novelty of the smile therapy diminished. Four weeks might not be a 
sufficient amount of time for the novelty to diminish, and the results of 
the study at 12 weeks might not have demonstrated a significant finding 
for this new form of therapy. The presence of a novelty effect would limit 
the researcher's ability to generalize the results of this study to situations 
or context in which the same effect does not exist. 

This effect can also be seen outside the treatment-intervention arena. 
Suppose you wanted to determine the effectiveness of an intervention de- 
signed to increase teamwork and related productivity for top-level man- 
agers in two distinct organizational settings. Putting aside the obvious 
threats to internal validity created by conducting your study without ran- 
domization in two separate environments, let's further explore the impli- 
cations of the novelty effect. The researchers identify the top managers in 
both organizations and administer the intervention. One organization is a 
manufacturing company and the other is a large financial management 
firm. The researchers find that the intervention increases productivity and 
teamwork, but only in the financial management firm. The researchers 
therefore conclude that the intervention is effective, but only in the one 
environment. 

It is also possible, however, that the finding is due to a novelty effect and 
not to the intervention itself. Let's add some additional relevant informa- 
tion. What if you knew that the manufacturing company was engaged in a 
total quality improvement program? These programs tend to involve a 
high level of teamwork and group interaction on a daily basis. You also dis- 
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cover that the financial management firm has never addressed the issue of 
teamwork or group productivity in the past. Therefore, the significant 
finding might be due to the novelty of introducing teamwork into a setting 
where it had never previously been considered, and not to the teamwork 
intervention itself. Conversely, the intervention might not have been ef- 
fective in the manufacturing company because the organization had al- 
ready incorporated the model into their corporate culture. What if we tried 
the intervention in a financial management firm that had already imple- 
mented a team approach? Again, we might find that the intervention is not 
effective. If that were indeed the case, then in terms of generalizability, 
the more accurate statement might be that the intervention is effective in 
financial management companies that have never been exposed to team- 
building interventions. 

Reactivity of Assessment 

This threat to external validity refers to a phenomenon whereby partici- 
pants' awareness that their performance is being measured can alter their 
performance from what it would otherwise have been (Christensen, 1988; 
Kazdin, 2003c). Reactivity is a threat to external validity when this aware- 
ness leads study participants to respond differently than they normally 
would in the face of experimental conditions. 

Reactivity is another common threat to external validity that can occur 
across a wide variety of environments and circumstances, and it is a sub- 
stantial threat whenever formal or informal assessment is a necessary 
component of the study. For example, consider a psychotherapy outcome 
study where participants are assessed for number and severity of symp- 
toms of emotional distress. The very fact that an assessment is taking place 
might cause the participants to distort their responses for a variety of 
reasons. For example, participants might feel uncomfortable or self- 
conscious and underreport their symptoms. Conversely, participants 
might overreport their symptom levels if they suspect that doing so might 
lead to more intensive treatment. Rapid Reference 6.4 discusses the ob- 
trusiveness of the measurement process with regard to participant reac- 
tivity. 
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Obtrusive vs. Unobtrusive Measurement 

As mentioned previously, reactivity becomes a threat to external validity 
when participants in a study respond differently than they normally would 
in the face of experimental conditions. Although a wide variety of stimuli 
can cause reactivity, the most common example occurs during formal 
measurement or assessment. If participants are aware that they are being 
assessed, then that assessment measure is said to be obtrusive and there- 
fore likely to affect behavior Conversely, the term unobtrusive measure- 
ment refers to assessment in which the participants are unaware that the 
measurement is taking place (Rosnow & Rosenthal, 2002). 



Although reactivity is common in all forms of medical and psycholog- 
ical treatment intervention studies, it is prevalent in other settings as well. 
For example, directly asking employees about their attitudes toward man- 
agement might lead to more favorable responses than might otherwise be 
expected if they filled out an anonymous questionnaire. 

Pretest and Posttest Sensitization 

These related threats to external validity refer to the effects that pretesting 
and posttesting might have on the behavior and responses of the partici- 
pants in a study (Bracht & Glass, 1968; Lana, 1969; Pedhazur & 
Schmelkin, 1991). In many forms of research, participants are pretested to 
quantify the presence of some variable of interest and to provide a base- 
line of behavior against which the effects of the experimental intervention 
(independent variable) can be evaluated. For example, a pretest for symp- 
toms of anxiety would be given to determine participant symptomology in 
a treatment study investigating the effectiveness of a new therapy for anx- 
iety disorders. The pretest information would be used as a baseline mea- 
sure and compared to a posttest measure of symptoms at the conclusion 
of the study to determine the intervention's effectiveness at reducing 
symptoms of anxiety. Generally, pretest sensitization is a possibility when- 
ever participants are measured prior to the administration of the experi- 
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mental intervention and the researchers are interested in measuring the ef- 
fects of the independent variable on the dependent variable. 

As a threat to external validity, the concern is that exposure to the 
pretest may contribute to, or be the sole cause of, the observed changes in 
the dependent variable. In other words, would the results of the study have 
been the same if the pretest had not been administered? This has obvious 
implications for external validity because pretest sensitization might ren- 
der the results irrelevant in situations in which the same pretest was not ad- 
ministered. For example, in our previously mentioned anxiety study the 
same treatment effects might not be found in the absence of the pretest 
for current level of anxiety. 

Whereas pretesting is focused on assessing the level of a variable before 
application of the experimental intervention (or independent variable), 
posttestingh conducted to assess the effectiveness of the independent vari- 
able. A posttest measurement can have a similar effect on external validity 
as a pretest assessment. Would the same results have been found if the 
posttest had not been administered? If not, then it can be said that posttest 
sensitization might account for the results either alone or in combination 
with the experimental intervention. 

In both pre- and postassessment, the concern is whether participants 
were sensitized by either measure. If so, the findings might be less gener- 
alizable than if future research and actual interventions were conducted 
without the same procedure and assessment measures. In other words, the 
presence of pre- and posttesting becomes an integral part of the interven- 
tion itself. Therefore, the effects of the independent variable might be less 
prominent or even nonexistent in the absence of pretest or posttest sensi- 
tization. 

Timing of Assessment and Measurement 

This threat to external validity is particularly common in longitudinal 
forms of research, and it refers to the question of whether the same results 
would have been obtained if measurement had occurred at a different 
point in time (Kazdin, 2003c). 

Although this threat to external validity can occur in most types of re- 
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search design, it is most common in longitudinal research. (See Chapter 5 
for a more detailed discussion of longitudinal research.) Longitudinal re- 
search occurs over time and is characterized by multiple assessments over 
the duration of the study. For example, a longitudinal therapy outcome 
study might find significant results after assessment of symptoms at 2 
months, but not at 4 or 6 months. If the study concluded at the end of 2 
months, the researchers might come to the general conclusion that the 
treatment is effective for a particular disorder. This might be an overgen- 
eralization because if the study had continued for a longer period of time, 
the same treatment effect would not have been observed. Thus, the more 
appropriate conclusion about our 2-month study might be that the treat- 
ment produces symptom relief for up to or after 2 months. The more spe- 
cific conclusion is supported by the study, while the more general conclu- 
sion about effectiveness might not be accurate due to the timing of 
measurement. Bear in mind that the reverse might also be true: A lack of 
significant findings after measurement at 2 months does not eliminate the 
possibility of significant results if the intervention and measurement oc- 
curred over a longer period of time. 

Rapid Reference 6.5 summarizes the threats to external validity we 
have discussed in this section, and Rapid Reference 6.6 provides further 
discussion. 

CONSTRUCT VALIDITY 

In the context of research design and methodology, the term construct va- 
lidity relates to interpreting the basis of the causal relationship, and it refers 
to the congruence between the study's results and the theoretical under- 
pinnings guiding the research (Kazdin, 2003c). The focus of construct va- 
lidity is usually on the study's independent variable. In essence, construct 
validity asks the question of whether the theory supported by the findings 
provides the best available explanation of the results. In other words, is the 
reason for the relationship between the experimental intervention (inde- 
pendent variable) and the observed phenomenon (dependent variable) 
due to the underlying construct or explanation offered by the researchers 
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Threats to External Validity 

Sample characteristics: The extent to which the results of a study 
apply only to a particular sample. The key question is whether the 
study's results can be applied to other samples that vary on a variety of 
demographic and descriptive characteristics, such as age, gender; sexual 
orientation, education, and socioeconomic status. 

Stimulus characteristics and settings: An environmental phe- 
nomenon whereby particular features or conditions of the study limit 
the generalizability of the findings so that the findings from one study 
do not necessarily apply to another study, even if the other study is us- 
ing a similar sample. 

Reactivity of experimental arrangements: A potentially con- 
founding variable that results from the influence produced by knowing 
that one is participating in a research study. 

Multiple-treatment interference:This threat refers to research 
situations in which (I) participants are administered more than one ex- 
perimental intervention within the same study or (2) the same individu- 
als participate in more than one study. 

Novelty effects: This refers to the possibility that the effects of the in- 
dependent variable may be due in part to the uniqueness or novelty of 
the stimulus or situation and not to the intervention itself. 

Reactivity of assessment: A phenomenon whereby participants' 
awareness that their performance is being measured can altertheir 
performance from what it otherwise would have been. 

Pretest and posttest sensitization:These threats refer to the ef- 
fects that pretesting and posttesting might have on the behavior and re- 
sponses of study participants. 

Timing of assessment and measurement:This threat refers to 
whether the same results would have been obtained if measurement 
had occurred at a different point in time. 
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Importance of Interaction Effects in Relation to 
External Validity 

External validity can best be understood as an interaction between partic- 
ipant attributes and experimental settings and their related characteristics. 
Generalization of results from any study is hampered when the indepen- 
dent variable interacts with participant attributes or characteristics of the 
experimental setting to produce the observed results. Therefore, the 
types of threats to external validity discussed in this chapter are far from 
exhaustive. Depending on the experimental design and the research ques- 
tion, each study can create unique threats to external validity that should 
be controlled for If experimental control is not possible, the limitations of 
the study's findings should be discussed in sufficient detail to clarify the 
relevance and generalizability of the findings. 

(Campbell & Stanley, 1966; Cook & Campbell, 1979; Christensen, 1988; 
Graziano & Raulin, 2004; Kazdin, 2003c)? 

There are two primary methods for improving the construct validity of 
a study. First, strong construct validity is based on clearly stated and accu- 
rate operational definitions of a study's variables. Second, the underlying 
theory of the study should have a strong conceptual basis and be based on 
well-validated constructs (Graziano & Raulin, 2004). Cook and Campbell 
(1979) suggest several ways to improve construct validity; these are listed 
in Rapid Reference 6.7. 

Let's consider a straightforward example to illustrate the importance of 
construct validity in a study. A team of researchers is interested in study- 
ing the factors that contribute to mortality rates in a number of different 
countries. The scope of the study prohibits the use of actual participants, 
so the researchers decide to conduct a correlational study in which they 
analyze the statistical relationships between different countries and avail- 
able demographic data. The researchers hypothesize that education level 
and family income will be significantly related to mortality rate. The spe- 
cific hypothesis is that mortality rate will drop as education level and 
family income rise. In other words, the researchers are hypothesizing that 
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Improving Construct Validity 

Cook and Campbell ( 1 979) make the following suggestions for improving 
construct validity: 

• Provide a clear operational definition of the abstract concept or inde- 
pendent variable. 

• Collect data to demonstrate that the empirical representation of the 
independent variable produces the expected outcome. 

• Collect data to show that the empirical representation of the indepen- 
dent variable does not vary with measures of related but different con- 
ceptual variables. 

• Conduct manipulation checks of the independent variable. 

there is a negative relationship between mortality and education level and 
family income. The underlying construct being tested in the study is that 
these two factors — education level and family income — are negatively re- 
lated to mortality. The researchers conduct their analyses and discover that 
their hypothesis is confirmed — that is, that mortality rates are negatively 
related to education level and family income. The researchers conclude 
that educational level and family income are protective factors that reduce 
the likelihood of mortality. 

Is this the most likely explanation for the results, or is there perhaps 
a better explanation that might function as a threat to the study's hypo- 
thesis regarding causation (or construct validity)? What might be a better 
causal explanation for the results of the study? One possible alternative ex- 
planation of the results might be that higher educational levels and family 
income reduce mortality rates because they are related to another factor 
that was not considered in the study. Considering that educational level is 
usually positively related to income level, higher levels of education tend 
to lead to higher levels of income. A higher level of income usually pro- 
vides access to a wider variety of privileges and services, such as access to 
higher-quality health care. Access to health care is therefore related to ed- 
ucation level and family income, and it is a plausible causal explanation for 
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unique aspects and design of the 
study itself. Generally, these threats are features of a study that interfere 
with the researcher's ability to draw causal inferences from the study's re- 
sults (Kazdin, 2003c). In our previous discussions of internal and external 
validity, we were able to identify and categorize specific and well-defined 
threats. The threats to construct validity are more difficult to classify be- 
cause they can be anything that relates to the design of the study and the 
underlying theoretical construct under consideration. Despite this, the 
most common sources of threats to construct validity closely parallel 
some of the threats to external validity discussed earlier in this chapter 
such as conditions surrounding the experimental situation, experimenter 
expectancies, and characteristics of the participants. 



STATISTICAL VALIDITY 

The final type of validity that we will discuss in this chapter is the critically 
important yet often-overlooked concept of statistical validity. As its name 
implies, statistical validity (also referred to as statistical conclusion validity) refers 
to aspects of quantitative evaluation that affect the accuracy of the con- 
clusions drawn from the results of a study (Campbell & Stanley, 1966; 
Cook & Campbell, 1979). Statistical procedures are typically used to test 
the relationship between two or more variables and determine whether an 
observed statistical effect is due to chance oris a true reflection of a causal 
relationship (Rosnow & Rosenthal, 2002). At its simplest level, statistical 
validity addresses the question of whether the statistical conclusions 
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drawn from the results of a study are reasonable (Graziano & Raulin, 
2004). 

The concepts of hypothesis testing and statistical evaluation are inter- 
related, and they provide the foundation for evaluating statistical validity. 
Statistical evaluation refers to the theoretical basis, rationale, and computa- 
tional aspects of the actual statistics used to evaluate the nature of the re- 
lationship between the independent and dependent variables. Among 
other things, the choice of statistical techniques often depends on the na- 
ture of the hypotheses being tested in the study. This is where the concept 
of hypothesis testing enters our discussion of statistical validity. Put simply, 
every study is driven by one or more hypotheses that guide the method- 
ological design of the study, the statistical analyses, and the resulting con- 
clusions. 

As discussed in Chapter 2, there are two main types of hypotheses in re- 
search: the null hypothesis (usually designated as H ) and the experimen- 
tal hypothesis (usually designated as H p H 2 , H 3 , etc., depending on the 
number of hypotheses). The experimental hypothesis represents the predicted 
relationship among the variables being examined in the study. Conversely, 
the null hypothesis represents a statement of no relationship among the vari- 
ables being examined (Christensen, 1988). 

At this point, we should review an important convention in research 
methodology as it relates to statistical analyses and hypotheses testing. Re- 
jecting the null hypothesis is a necessary first step in evaluating the impact 
of the independent variable (Graziano & Raulin, 2004). Therefore, in 
terms of statistical analyses, the focus is always on the null hypothesis, and 
not on the experimental hypotheses. Researchers reject the null hypothe- 
sis if a statistically significant difference is found between the experimen- 
tal and control conditions (Kazdin, 2003c). By contrast, researchers retain 
(or fail to reject) the null hypothesis if no statistically significant difference 
is found between the experimental and control conditions. 

As with the other forms of validity discussed throughout this chapter, 
there are numerous threats to statistical validity. The most common in- 
clude low statistical power, variability in the experimental procedures and 
participant characteristics, unreliability of measures, and multiple com- 

term LinG - live, informative, Non-cost and Genuine \ 



I 94 ESSENTIALS OF RESEARCH DESIGN AND METHODOLOGY 



parisons and error rates. Each of these threats can have a significant im- 
pact on the study's ability to delineate causal relationships and rule out 
plausible rival hypotheses. 

Low Statistical Power 

Low statistical power is the most common threat to statistical validity (Kep- 
pel, 1991; Kirk, 1995). The presence of this threat produces a low proba- 
bility of detecting a difference between experimental and control condi- 
tions even when a difference truly exists. Low statistical power is directly 
related to small effect and sample sizes, with the presence of each increas- 
ing the likelihood that low statistical power is an issue in the research de- 
sign. Accordingly, low statistical power can cause a researcher to conclude 
that there are no significant results even when significant results actually 
exist (Rosnow & Rosenthal, 2002). The concept of power will be dis- 
cussed further in Chapter 7. 

Variability 

Variability is another threat to statistical validity that applies to both the 
participants and procedures used in a study. First, let's consider variability 
in methodological procedures. This concept includes a wide array of differences 
and questions that relate to the actual design aspects of the study. These 
differences can be found in the delivery of the independent variable, the 
procedures related to the execution of the study, variability in perfor- 
mance measures over time, and a host of other examples that are directly 
dependent on the unique design of a particular study. A related threat to 
statistical validity is variability in participant characteristics. Participants in a re- 
search study can vary along a variety of characteristics and dimensions, 
such as age, education, socioeconomic status, and race. As the diversity of 
participant characteristics increases, there is less likelihood that a differ- 
ence between the control and experimental conditions can be detected. 
When variability across these two broad sources is minimized, the likeli- 
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hood of detecting a true difference between the control and experimental 
conditions increases. This threat to statistical validity must be considered 
at the planning stage of the study, and it is usually controlled through the 
use of homogeneous samples, strict and well-defined procedural proto- 
cols, and statistical controls at the data analysis stage. 

Unreliability of Measures 

Unreliability of measures used in a study is another source of variability 
that is a threat to statistical validity. This threat refers to whether the mea- 
sures used in the study assess the characteristics of interest in a consis- 
tent — or reliable — fashion (Kazdin, 2003c). If the research study's mea- 
sures are unreliable, then more random variability is introduced into the 
experimental design. As with participant and procedural variability, this 
type of variability decreases statistical power and makes it less likely that 
the statistical analyses will detect a true difference between the control and 
experimental conditions when a difference actually exists. 

Multiple Comparisons 

The final threat to statistical validity that we will consider is often referred 
to as multiple statistical comparisons and the resulting error rates (Kazdin, 
2003c; Rosnow & Rosenthal, 2002). This threat to statistical validity per- 
tains to the number of statistical analyses used to analyze the data obtained 
in a study. Generally, as the number of statistical analyses increases, so does 
the likelihood of finding a significant difference between the experimental 
and control conditions purely by mathematical chance. In other words, the 
significant finding is a mathematical artifact and does not reflect a true dif- 
ference between conditions. Accordingly, researchers should define their 
hypotheses before the study begins so as to conduct the minimum number 
of statistical analyses to address each of the hypotheses. 

Rapid Reference 6.8 summarizes the threats to statistical validity that 
we have discussed in this section. 
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Threats to Statistical Validity 

• Low statistical power: Low probability of detecting a difference be- 
tween experimental and control conditions even if a difference truly 
exists. 

• Procedural and participant variability: Variability in methodolog- 
ical procedures and a host of participant characteristics, which de- 
creases the likelihood of detecting a difference between the control 
and experimental conditions. 

• Unreliability of measures: Whetherthe measures used in a study 
assess the characteristics of interest in a consistent manner Unreliable 
measures introduce more random variability into the research design, 
which reduces statistical power 

• Multiple comparisons and error rates:The concept that, as the 
number of statistical analyses increases, so does the likelihood of finding 
a significant difference between the experimental and control condi- 
tions purely by chance. 



SUMMARY 

In this chapter, we have discussed the four types of validity that are criti- 
cal to sound research methodology. In addition, we discussed the major 
threats to each type of validity. Although each type of validity and its re- 
lated threats were presented independently, it is important to note that all 
types of validity are interdependent, and addressing one type may com- 
promise the other types. As was discussed, all of the broad threats to va- 
lidity should be considered at the design stage of the study if possible. In 
terms of priority, ensuring strong internal validity is regarded as more im- 
portant than external validity, because we must control for rival hypothe- 
ses before we can even begin to think about generalizing the results of a 
study. 
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1 . is an important concept in research that refers to the concep- 
tual and scientific soundness of a research study. 

2. History, maturation, testing, statistical regression, and selection biases are 
threats to . 

3. External validity is concerned with the of research results. 

4. refers to aspects of quantitative evaluation that af- 
fect the accuracy of the conclusions drawn from the results of a study. 

S. refers to the congruence between the study's re- 
sults and the theoretical underpinnings guiding the research. 

Answers: I .Validity; 2. internal validity; 3. generalizability; 4. Statistical conclusion; 5. Construct 
validity 
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DATA PREPARATION, ANALYSES, 
AND INTERPRETATION 



As we have discussed in previous chapters, in most research stud- 
ies, the researcher begins by generating a research question, fram- 
ing it into a testable (i.e., falsifiable) hypothesis, selecting an ap- 
propriate research design, choosing a suitable sample of research 
participants, and selecting valid and reliable methods of measurement. If 
all of these tasks have been carried out properly, then the process of data 
analysis should be a fairly straightforward process. Still, a variety of im- 
portant steps must be taken to ensure the integrity and validity of research 
findings and their interpretation. 

In most types of research studies, the process of data analysis involves 
the following three steps: (1) preparing the data for analysis, (2) analyzing 
the data, and (3) interpreting the data (i.e., testing the research hypotheses 
and drawing valid inferences). Therefore, we will begin this chapter with a 
brief discussion of data cleaning and organization, followed by a nontech- 
nical overview of the most widely used descriptive and inferential statis- 
tics. We will conclude this chapter with a discussion of several important 
concepts that should be understood when interpreting and drawing infer- 
ences from research findings. Because a comprehensive discussion of sta- 
tistical techniques is well beyond the scope of this book, researchers seek- 
ing a more detailed review of statistical analyses should consult one of the 
statistical textbooks contained in the reference list. 



198 
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DATA PREPARATION 

Virtually all studies, from surveys to randomized experimental trials, re- 
quire some form of data collection and entry Data represent the fruit of 
researchers' labor because they provide the information that will ulti- 
mately allow them to describe phenomena, predict events, identify and 
quantify differences between conditions, and establish the effectiveness of 
interventions. Because of their critical nature, data should be treated with 
the utmost respect and care. In addition to ensuring the confidentiality 
and security of personal data (as discussed in Chapter 8), the researcher 
should carefully plan how the data will be logged, entered, transformed (as 
necessary), and organized into a database that will facilitate accurate and 
efficient statistical analysis. 

Logging and Tracking Data 

Any study that involves data collection will require some procedure to log 
the information as it comes in and track it until it is ready to be analyzed. 
Research data can come from any number of sources (e.g., personal 
records, participant interviews, observations, laboratory reports, and 
pretest and posttest measures). Without a well-established procedure, data 
can easily become disorganized, uninterpretable, and ultimately unusable. 

Although there is no one definitive method for logging and tracking 
data collection and entry, in this age of computers it might be considered 
inefficient and impractical not to take advantage of one of the many avail- 
able computer applications to facilitate the process. Taking the time to set 
up a recruitment and tracking system on a computer database (e.g., Mi- 
crosoft Access, Microsoft Excel, Claris FileMaker, SPSS, SAS) will provide 
researchers with up-to-date information throughout the study, and it will 
save substantial time and effort when they are ready to analyze their data 
and report the findings. 

One of the key elements of the data tracking system is the recruitment 
log. The recruitment log is a comprehensive record of all individuals ap- 
proached about participation in a study. The log can also serve to record 
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the dates and times that potential participants were approached, whether 
they met eligibility criteria, and whether they agreed and provided in- 
formed consent to participate in the study Importantly, for ethical rea- 
sons, no identifying information should be recorded for individuals who 
do not consent to participate in the research study The primary purpose 
of the recruitment log is to keep track of participant enrollment and to de- 
termine how representative the resulting cohort of study participants is of 
the population that the researcher is attempting to examine. 

In some study settings, where records are maintained on all potential 
participants (e.g., treatment programs, schools, organizations), it may be 
possible for the researcher to obtain aggregate information on eligible in- 
dividuals who were not recruited into the study, either because they chose 
not to participate or because they were not approached by the researcher. 
Importantly, because these individuals did not provide informed consent, 
these data can only be obtained in aggregate, and they must be void of any 
identifying information. Given this type of aggregate information, the re- 
searcher would be able to determine whether the study sample is repre- 
sentative of the population. 

In addition to logging client recruitment, a well-designed tracking sys- 



DOK'T FORGET 



Record-Keeping Responsibilities 

The lead researcher (referred to as principal investigator in grant-funded 
research) is ultimately responsible for maintaining the validity and quality 
of all research data, including the propertraining of all research staff and 
developing and enforcing policies for recording, maintaining, and storing 
data.The researcher should ensure that 

• research data are collected and recorded according to policy; 

• research data are stored in a way that will ensure security and confi- 
dentiality; and 

• research data are audited on a regular basis to maintain quality control 
and identify potential problems as they occur 
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tern can provide the researcher with up-to-date information on the gen- 
eral status of the study, including client participation, data collection, and 
data entry. 

Data Screening 

Immediately following data collection, but prior to data entry, the re- 
searcher should carefully screen all data for accuracy. The promptness of 
these procedures is very important because research staff may still be able 
to recontact study participants to address any omissions, errors, or inac- 
curacies. In some cases, the research staff may inadvertently have failed to 
record certain information (e.g., assessment date, study site) or perhaps 
recorded a response illegibly. In such instances, the research staff may be 
able to correct the data themselves, if too much time has not elapsed. Be- 
cause data collection and data entry are often done by different research 
staff, it may be more difficult and time consuming to make such clarifica- 
tions once the information is passed on to data entry staff. 

One way to simplify the data screening process and make it more time 
efficient is to collect data using computerized assessment instruments. 
Computerized assessments can be programmed to accept only responses 
within certain ranges, to check for blank fields or skipped items, and even 
to conduct cross-checks between certain items to identify potential in- 
consistencies between responses. Another major benefit of these pro- 
grams is that the entered data can usually be electronically transferred into 
a permanent database, thereby automating the data entry procedure. Al- 
though this type of computerization may, at first glance, appear to be an 
impossible budgetary expense, it might be more economical than it seems 
when one considers the savings in staff time spent on data screening and 
entry. 

Whether it is done manually or electronically, data screening is an es- 
sential process in ensuring that data are accurate and complete. Generally, 
the researcher should plan to screen the data to make certain that (1) re- 
sponses are legible and understandable, (2) responses are within an ac- 
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ceptable range, (3) responses are complete, and (4) all of the necessary in- 
formation has been included. 



Constructing a Database 

Once data are screened and all corrections are made, the data should be 
entered into a well-structured database. When planning a study, the re- 
searcher should carefully consider the structure of the database and how 
it will be used. In many cases, it may be helpful to think backward and to 
begin by anticipating how the data will be analyzed. This will help the re- 
searcher to figure out exactly which variables need to be entered, how they 
should be ordered, and how they should be formatted. Moreover, the sta- 
tistical analysis may also dictate what type of program you choose for your 
database. For example, certain advanced statistical analyses may require 
the use of specific statistical programs. 

While designing the general structure of the database, the researcher 
must carefully consider all of the variables that will need to be entered. 
Forgetting to enter one or more variables, although not as problematic as 
failing to collect certain data elements, will add substantial effort and ex- 
pense because the researcher 
must then go back to the hard data 
DOU'T PQRG-ET to nnd tne missing data elements. 



Retaining Data Records 

Researchers should retain study 
data for a minimum period of 5 
years after publication of their data 
in the event that questions or con- 
cerns arise regarding the findings. 
The advancement of science relies 
on the scientific community's over- 
all confidence in disseminated 
findings, and the existence of the 
primary data serves to instill such 
confidence. 



The Data Codebook 

In addition to developing a well- 
structured database, researchers 
should take the time to develop a 
data codebook. A data codebook is a 
written or computerized list that 
provides a clear and comprehen- 
sive description of the variables 
that will be included in the data- 
base. A detailed codebook is es- 
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sential when the researcher begins to analyze the data. Moreover, it serves 
as a permanent database guide, so that the researcher, when attempting to 
reanalyze certain data, will not be stuck trying to remember what certain 
variable names mean or what data were used for a certain analysis. Ulti- 
mately, the lack of a well-defined data codebook may render a database 
uninterpretable and useless. At a bare minimum, a data codebook should 
contain the following elements for each variable: 

• Variable name 

• Variable description 

• Variable format (number, data, text) 

• Instrument or method of collection 

• Date collected 

• Respondent or group 

• Variable location (in database) 

• Notes 



Data Entry 

After the data have been screened for completeness and accuracy, and the 
researcher has developed a well-structured database and a detailed code- 



DOK'T FORGET 



Defining Variables Within a Database 

Certain databases, particularly statistical programs (e.g., SPSS) allow the 
researcher to enter a wide range of descriptive information about each 
variable, including the variable name, the type of data (e.g., numeric, text, 
currency, date), label (how it will be referred to in data printouts), how 
missing data are coded or treated, and measurement scale (e.g., nominal, 
ordinal, interval, or ratio). Although these databases are extremely helpful 
and should be used whenever possible, they do not substitute for a com- 
prehensive codebook, which includes separate information about the dif- 
ferent databases themselves (e.g., which databases were used for each set 
of analyses). 
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book, data entry should be fairly straightforward. Nevertheless, many er- 
rors can occur at this stage. Therefore, it is critical that all data-entry staff 
are properly trained and maintain the highest level of accuracy when in- 
putting data. One way of ensuring the accuracy of data entry is through 
double entry. In the double-entry procedure, data are entered into the data- 
base twice and then compared to determine whether there are any dis- 
crepancies. The researcher or data entry staff can then examine the dis- 
crepancies and determine whether they can be resolved and corrected or 
if they should simply be treated as missing data. Although the double- 
entry process is a very effective way to identify entry errors, it may be dif- 
ficult to manage and may not be time or cost effective. 

As an alternative to double entry, the researcher may design a standard 
procedure for checking the data for inaccuracies. Such procedures typi- 
cally entail a careful review of the inputted data for out-of-range values, 
missing data, and incorrect formatting. Much of this work can be accom- 
plished by running descriptive analyses and frequencies on each variable. 
In addition, many database programs (e.g., Microsoft Excel, Microsoft 
Access, SPSS) allow the researcher to define the ranges, formats, and types 
of data that will be accepted into certain data fields. These databases will 
make it impossible to enter information that does not meet the preset cri- 
teria. Defining data entry criteria in this manner can prevent many errors 
and it may substantially reduce the time spent on data cleaning. 



Transforming Data 

After the data have been entered and checked for inaccuracies, the re- 
searcher or data entry staff will undoubtedly be required to make certain 
transformations before the data can be analyzed. These transformations 
typically involve the following: 

• Identifying and coding missing values 

• Computing totals and new variables 

• Reversing scale items 

• Recoding and categorization 
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Identifying and Coding Missing Values 

Inevitably, all databases and most variables will have some number of 
missing values. This is a result of either study participants' failing to re- 
spond to certain questions, missed observations, or inaccurate data that 
were rejected from the database. Researchers and data analysts often do 
not want to include certain cases with missing data because they may po- 
tentially skew the results. Therefore, most statistical packages (e.g., SPSS, 
SAS) will provide the option of ignoring cases in which certain variables 
are considered missing, or they will automatically treat blank values as 
missing. These programs also typically allow the researcher to designate 
specific values to represent missing data (e.g., —99). A small sample of the 
many techniques used for imputing missing data values are discussed in 
Rapid Reference 7.1. 



fiap/i? Reference /./ 



Missing Value Imputation 

Virtually all databases have some number of missing values. Unfortunately, 
statistical analysis of data sets with missing values can result in biased re- 
sults and incorrect inferences. Although numerous techniques have been 
offered to impute missing values, there is an ongoing debate in contem- 
porary statistics as to which technique is the most appropriate. A few of 
the more widely used imputation techniques include the following: 

Hot deck imputation: In this imputation technique, the researcher 
matches participants on certain variables to identify potential donors. 
Missing values are then replaced with values taken from matching respon- 
dents (i.e., respondents who are matched on a set of re levant factors). 

Predicted mean imputation: Imputed values are predicted using cer- 
tain statistical procedures (i.e., linear regression for continuous data and 
discriminant function for dichotomous or categorical data). 

Last value carried forward: Imputed values are based on previously 
observed values. This method can be used only for longitudinal variables, 
for which participants have values from previous data collection points. 

Group means: Imputed variables are determined by calculating the vari- 
able's group mean (or mode, in the case of categorical data). 
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Computing Totals and New Variables 

In certain instances, the researcher may want to create new variables based 
on values from other variables. For example, suppose a researcher has data 
on the total number of times clients in two different treatments attended 
their treatments each month. The researcher would have a total of four 
variables, each representing the number of sessions attended each week 
during the first month of treatment. Let's call them ql, q2, q3, and q4. If 
the researcher wanted to analyze monthly attendance by the different 
treatments, he or she would have to compute a new variable. This could be 
done with the following transformation: 

total = ql + q2 + q3 + q4 

Still another reason for transforming variables is that the variable may 
not be normally distributed (see Rapid Reference 7.2). This can substan- 
tially alter the results of the data analysis. In such instances, certain data 
transformations (see Rapid Reference 7.3) may serve to normalize the dis- 
tribution and improve the accuracy of outcomes. 

Reversing Scale Items 

Many instruments and measures use items with reversed scales to decrease 
the likelihood of participants' falling into what is referred to as a "response 
set." A response set occurs when a participant begins to respond in a pat- 
terned manner to questions or 



^fap/o 'Reference 7.2 

Normal Distributions 



statements on a test or assessment 
measure, regardless of the content 
of each query or statement. For 
example, an individual may an- 
swer false to all test items, or may 
provide a 1 for all items requesting 



A normal distribution is a distribu- 
tion of the values of a variable 
that, when plotted, produces a 
symmetrical, bell-shaped curve a response from 1 to 5. Here's an 

that rises smoothly from a small example of how reverse scale 

number of cases at each extreme items wor ^. Le t's say that partici- 

to a large number of cases in the , , ,■ 

. , ,, pants in a survey are asked to mdi- 

middle. 

cate their levels of agreement, 
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Data Transformations 

Most statistical procedures assume that the variables being analyzed are 
normally distributed. Analyzing variables that are not normally distributed 
can lead to serious overestimation (Type I error) or underestimation 
(Type II error). Therefore, before analyzing their data, researchers should 
carefully examine variable distributions. Although this is often done by 
simply looking overthe frequency distributions, there are many, more- 
objective methods of determining whether variables are normally distrib- 
uted. Typically, these involve examining each variable's skewness, which 
measures the overall lack of symmetry of the distribution, and whether it 
looks the same to the left and right of the center point; and its kurtosis, 
which measures whether the data are peaked or flat relative to a normal 
distribution. Unfortunately, many variables in the social sciences and within 
particular sample populations are not normally distributed. Therefore, re- 
searchers often rely on one of several transformations to potentially im- 
prove the normality of certain variables. The most frequently used trans- 
formations are the square root transformation, the log transformation, 
and the inverse transformation. 

Square root transformation: Described simply, this type of transfor- 
mation involves taking the square root of each value within a certain vari- 
able. The one caveat is that you cannot take a square root of a negative 
number Fortunately, this can be easily remedied by adding a constant, 
such as I , to each item before computing the square root. 

Log transformation:There is a wide variety of log transformations. In 
general, however a logarithm is the power (also known as the exponent) 
to which a base number has to be raised to get the original number As 
with square root transformation, if a variable contains values less than I , a 
constant must be added to move the minimum value of the distribution. 

Inverse transformation:Thistype of transformation involves taking 
the inverse of each value by dividing it into I . For example, the inverse of 
3 would be computed as 1 13. Essentially, this procedure makes very small 
values very large, and very large values very small, and it has the effect of 
reversing the order of a variable's scores. Therefore, researchers using this 
transformation procedure should be careful not to misinterpret the 
scores following their analysis. 
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from 1 to 5, with a series of statements. In this survey, 1 corresponds with 
completely disagree and 5 corresponds with completely agree. The researcher 
may decide, however, to reverse-scale some of the items on the survey, so 
that 1 corresponds with completely agree and 5 corresponds with completely 
disagree. This may reduce the likelihood that participants will fall into a re- 
sponse set. Before data can be analyzed, it is important that all reversed 
items are recoded so that all of the responses fall in the same direction. 

Recoding Variables 

Some variables may be more easily analyzed if they are recoded into cate- 
gories. For example, a researcher may wish to collapse income estimates 
or ages into specific ranges. This is an example of turning a continuous 
variable into a categorical variable (as was discussed in Chapter 2). Al- 
though categorizing continuous variables may ultimately reduce their 
specificity, in some cases it may be warranted to simplify data analysis and 
interpretation. In other instances, it may be necessary to recategorize or 
recode categorical variables by combining them into fewer categories. 
This is often the case when variables have so many categories that certain 
categories are sparsely populated, which may violate the assumptions of 
certain statistical analyses. To resolve this issue, researchers may choose to 
combine or collapse certain categories. 

Once the data have been screened, entered, cleaned, and transformed, 
they should be ready to be analyzed. It is possible, of course, that the data 
will need to be recoded or transformed again during the analyses. In fact, 
the need for many of the transformations discussed previously will not be 
identified until the analyses have begun. Still, taking the time to carefully 
prepare the data first should make data analysis more efficient and im- 
prove the overall validity of the study's findings. 



DATA ANALYSIS 

As mentioned earlier, research data can be seen as the fruit of researchers' 
labor. If a study has been conducted in a scientifically rigorous manner, the 
data will hold the clues necessary to answer the researchers' questions. To 
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unlock these clues, researchers typically rely on a variety of statistical pro- 
cedures. These statistical procedures allow researchers to describe groups 
of individuals and events, examine the relationships between different 
variables, measure differences between groups and conditions, and exam- 
ine and generalize results obtained from a sample back to the population 
from which the sample was drawn. Knowledge about data analysis can 
help a researcher interpret data for the purpose of providing meaningful 
insights about the problem being examined. 

Although a comprehensive review of statistical procedures is beyond 
the scope of this text, in general, they can be broken down into two major 
areas: descriptive and inferential. Descriptive statistics allow the researcher to 
describe the data and examine relationships between variables, while infer- 
ential 'statistics 'allow the researcher to examine causal relationships. In many 
cases, inferential statistics allow researchers to go beyond the parameters 
of their study sample and draw conclusions about the population from 
which the sample was drawn. This section will provide a brief overview of 
some of the more commonly used descriptive and inferential statistics. 

Descriptive Statistics 

As their name implies, descriptive statistics are used to describe the data 
collected in research studies and to accurately characterize the variables 
under observation within a specific sample. Descriptive analyses are fre- 
quently used to summarize a study sample prior to analyzing a study's pri- 
mary hypotheses. This provides information about the overall representa- 
tiveness of the sample, as well as the information necessary for other 
researchers to replicate the study, if they so desire. In other research ef- 
forts (i.e., purely descriptive studies), precise and comprehensive descrip- 
tions may be the primary focus of the study. In either case, the principal 
objective of descriptive statistics is to accurately describe distributions of 
certain variables within a specific data set. 

There is a variety of methods for examining the distribution of a vari- 
able. Perhaps the most basic method, and the starting point and founda- 
tion of virtually all statistical analyses, is the frequency distribution. A 
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frequency distribution is simply a complete list of all possible values or scores 
for a particular variable, along with the number of times (frequency) that 
each value or score appears in the data set. For example, teachers and in- 
structors who want to know how their classes perform on certain exams 
will need to examine the overall distribution of the test scores. The teacher 
would begin by sorting the scores so that they go from the lowest to the 
highest and then count the number of times that each score occurred. This 
information can be delineated in what is known as a frequency table, which 
is illustrated in Table 7.1. 

To make the distribution of scores even more informative, the teacher 
could group the test scores together in some manner. For example, the 

Table 7. 1 Frequency Distribution of Test Scores 

Value Frequency Cumulative Frequency 

1 

2 
4 
6 

7 



10 

12 
14 
15 
16 

18 
21 

22 
23 
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71 


1 


76 


1 


78 


2 


81 


2 


82 


1 


83 


1 


84 


2 


85 


2 


86 


2 


87 


1 


89 


1 


90 


2 


94 


3 


98 


1 


100 


1 
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Table 7.2 Grouped Frequency Distribution of Test Scores 



Value 



Frequency 



Cumulative Frequency 



71-75 


1 


76-80 


3 


81-85 


8 


86-90 


6 


91-95 


3 


96-100 


2 



1 

4 
12 
18 

21 

23 



teacher may decide to group the test scores from 71 to 75, 76 to 80, 81 to 
85, 86 to 90, 91 to 95, and 96 to 100. This type of grouping would result in 
the frequency distribution shown in Table 7.2. 

Still another way that this distribution may be depicted is in what is 
known as a histogram. A histogram (see Figure 7.1) is nothing more than a 
graphic display of the same information contained in the frequency tables 
shown in Tables 7.1 and 7.2. 



8 

7 

6 

5 

4 

3 

2 

1 



71-75 76-80 81-85 86-90 91-95 

Figure 7. 1 Grouped frequency histogram of test scores. 
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Although frequency tables and histograms provide researchers with a 
general overview of the distribution, there are more precise ways of de- 
scribing the shape of the distribution of values for a specific variable. 
These include measures of central tendency and dispersion. 

Central Tendency 

The central tendency of a distribution is a number that represents the typical 
or most representative value in the distribution. Measures of central ten- 
dency provide researchers with a way of characterizing a data set with a 
single value. The most widely used measures of central tendency are the 
mean, median, and mode. 

The mean, except in statistics courses and scientific journals, is more 
commonly known as the average. The mean is perhaps the most widely 
used and reported measure of central tendency. The mean is quite simple 
to calculate: Simply add all the numbers in the data set and then divide by 
the total number of entries. The result is the mean of the distribution. For 
example, let's say that we are trying to describe the mean age of a group 
of 10 study participants with the following ages: 

34 27 23 23 26 27 28 23 32 41 

The summed ages for the 10 participants is 284. Therefore, the mean age 
of the sample is 284/10 = 28.40. 

The mean is quite accurate when the data set is normally distributed. 
Unfortunately, the mean is strongly influenced by extreme values or out- 
liers. Therefore, it may be misleading in data sets in which the values are 
not normally distributed, or where there are extreme values at one end of 
the data set (skewed distributions). 

For example, consider a situation in which study participants report an- 
nual earnings of between $25,000 and $40,000. The mean annual income 
for the sample might wind up being around $35,000. Now consider what 
would happen if one or two of the participants reported earnings of 
$100,000 or more. Their substantially higher salaries (outliers) would dis- 
proportionately increase the mean income for the entire sample. In such 
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instances, a median or mode may provide much more meaningful sum- 
mary information. 

The median, as implied by its name, is the middle value in a distribution 
of values. To calculate the median, simply sort all of the values from low- 
est to highest and then identify the middle value. The middle value is the 
median. For example, sorting the set of ages in the previous example 
would result in the following: 

23 23 23 26 27 27 28 32 34 41 

In this instance, the median is 27, because the two middle values are 
both 27, with four values on either side. If the two values were different, 
you would simply split the difference to get the median. For example, if the 
two middle values were 27 and 28, the median would be 27.5. Calculation 
of the median is even simpler when the data set has an odd number of val- 
ues. In these cases, the median is simply the value that falls exacdy in the 
middle. 

The mode is yet another useful measure of central tendency. The mode 
is the value that occurs most frequendy in a set of values. To find the mode, 
simply count the number of times (frequency) that each value appears in a 
data set. The value that occurs most frequendy is the mode. For example, 
by examining the sorted distribution of ages listed below, we could easily 
see that the most prevalent age in the sample is 23, which is therefore the 
mode. 

23 23 23 26 27 27 28 32 34 41 

With larger data sets, the mode is more easily identified by examining a 
frequency table, as described earlier. The mode is very useful with nomi- 
nal and ordinal data or when the data are not normally distributed, because 
it is not influenced by extreme values or oudiers. Therefore, the mode is a 
good summary statistic even in cases when distributions are skewed. Also 
note that a distribution can have more than one mode. Two modes would 
make the distribution bimodal, while a distribution having three modes 
would be referred to as trimodal. 
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Interestingly, although the three measures of central tendency resulted 
in different values in the previous examples, in a perfectly normal distri- 
bution, the mean, median, and mode would all be the same. 

Dispersion 

Measures of central tendency, like the mean, describe the most likely value, 
but they do not tell us anything about how the values vary. For example, 
two sets of data can have the same mean, but they may vary greatly in the 
way that their values are spread out. Another way of describing the shape 
of a distribution is to examine this spread. The spread, more technically re- 
ferred to as the dispersion, of a distribution provides us with information 
about how tightly grouped the values are around the center of the distri- 
bution (e.g., around the mean, median, and/or mode). The most widely 
used measures of dispersion are range, variance, and standard deviation. 
The range of a distribution tells us the smallest possible interval in which 
all the data in a certain sample will fall. Quite simply, the range is the dif- 
ference between the highest and lowest values in a distribution. Therefore, 
the range is easily calculated by subtracting the lowest value from the high- 
est value. Using our previous example, the range of ages for the study 
sample would be: 

41-23 = 18 

Because it depends on only two values in the distribution, it is usually a 
poor measure of dispersion, except when the sample size is particularly 
large. 

A more precise measure of dispersion, or spread around the mean of a 
distribution, is the variance. The variance gives us a sense of how closely 
concentrated a set of values is around its average value, and is calculated in 
the following manner: 

1 . Subtract the mean of the distribution from each of the values. 

2. Square each result. 

3. Add all of the squared results. 

4. Divide the result by the number of values minus 1. 
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The variance of the set of 10 participant ages would therefore be calcu- 
lated in the following manner: 

Variance = [(23 - 28.40) 2 + (23 - 28.40) 2 + (23 - 28.40) 2 + (26 - 28.40) 2 

+ (27 - 28.40) 2 + (27 - 28.40) 2 + (28 - 28.40) 2 + (32 - 28.40) 2 

+ (34 - 28.40) 2 + (41 - 28.40) 2 ] + 9 = 33.37 

The variance of a distribution gives us an average of how far, in squared 
units, the values in a distribution are from the mean, which allows us to see 
how closely concentrated the scores in a distribution are. 

Another measure of the spread of values around the mean of a distri- 
bution is the standard deviation. The standard deviation is simply the square 
root of the variance. Therefore, the standard deviation for the set of par- 
ticipant ages is: 



V33.37 = 5.78 

By taking the square root of the variance, we can avoid having to think in 
terms of squared units. The variance and the standard deviation of distri- 
butions are the basis for calculating many other statistics that estimate 
associations and differences between variables. In addition, they provide us 
with important information about the values in a distribution. For ex- 
ample, if the distribution of values is normal, or close to normal, one can 
conclude the following with reasonable certainty: 

1. Approximately 68% of the values fall within 1 standard devia- 
tion of the mean. 

2. Approximately 95% of the values fall within 2 standard devia- 
tions of the mean. 

3. Approximately 99% of the values fall within 3 standard devia- 
tions of the mean. 

Therefore, assuming that the distribution is normal, we can estimate that 
because the mean age of participants was 28.40 and the standard deviation 
was 5.78, approximately 68% of the participants are within ±5.78 years (1 
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standard deviation) of the mean age of 28.40. Similarly, we can estimate 
that 95% of the participants are within ±11.56 years (2 standard devia- 
tions) of the mean age of 28.40. This information has several important 
applications. First, like the measures of central tendency, it allows the re- 
searcher to describe the overall characteristics of a sample. Second, it al- 
lows researchers to compare individual participants on a given variable 
(e.g., age). Third, it provides a way for researchers to compare an individ- 
ual participant's performance on one variable (e.g., IQ score) with his or 
her performance on another (e.g., SAT score), even when the variables are 
measured on entirely different scales. 

Measures of Association 

In addition to describing the shape of variable distributions, another im- 
portant task of descriptive statistics is to examine and describe the rela- 
tionships or associations between variables. 

Correlations are perhaps the most basic and most useful measure of as- 
sociation between two or more variables. Expressed in a single number 
called a correlation coefficient (r), correlations provide information about the 
direction of the relationship (either positive or negative) and the intensity 
of the relationship (—1.0 to +1.0). Furthermore, tests of correlations will 
provide information on whether the correlation is statistically significant. 
There is a wide variety of correlations that, for the most part, are deter- 
mined by the type of variable (e.g., categorical, continuous) being ana- 
lyzed. 

With regard to the direction of a correlation, if two variables tend to 
move in the same direction (e.g., height and weight), they would be con- 
sidered to have a positive or direct relationship. Alternatively, if two variables 
move in opposite directions (e.g., cigarette smoking and lung capacity), 
they are considered to have a negative or inverse relationship. Figure 7.2 gives 
examples of both types. 

Correlation coefficients range from —1.0 to + 1.0. The sign of the co- 
efficient represents the direction of the relationship. For example, a cor- 
relation of .78 would indicate a positive or direct correlation, while a cor- 
relation of —.78 would indicate a negative or inverse correlation. The 
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Figure 7.2 Positive and negative correlation directions. 

coefficient (value) itself indicates the strength of the relationship. The 
closer it gets to 1.0 (whether it is negative or positive), the stronger the re- 
lationship. In general, correlations of .01 to .30 are considered small, cor- 
relations of .30 to .70 are considered moderate, correlations of .70 to .90 
are considered large, and correlations of .90 to 1.00 are considered very 
large. Importantly, these are only rough guidelines. A number of other fac- 
tors, such as sample size, need to be considered when interpreting corre- 
lations. 

In addition to the direction and strength of a correlation, the coefficient 
can be used to determine the proportion of variance accounted for by the 
association. This is known as the coefficient oj determination (r 2 ). The coeffi- 
cient of determination is calculated quite easily by squaring the correlation 
coefficient. For example, if we found a correlation of .70 between cigarette 
smoking and use of cocaine, we could calculate the coefficient of deter- 
mination in the following manner: 

.70 X .70 = .49 

The coefficient of determination is then transformed into a percentage. 
Therefore, a correlation of .70, as indicated in the equation, explains ap- 
proximately 49% of the variance. In this example, we could conclude that 
49% of the variance in cocaine use is accounted for by cigarette smoking. 
Alternatively, a correlation of .20 would have a coefficient of determina- 
tion of .04 (.20 X .20 = .04), strongly indicating that other variables are 
likely involved. Importantly, as the reader might remember, correlation is 
not causation. Therefore, we cannot infer from this correlation that ciga- 
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rette smoking causes or influences cocaine use. It is equally as likely that 
cocaine use causes cigarette smoking, or that both unhealthy behaviors are 
caused by a third unknown variable. 

Although correlations are typically regarded as descriptive in nature, 
they can — unlike measures of central tendency and dispersion — be 
tested for statistical significance. Tests of significance allow us to estimate 
the likelihood that a relationship between variables in a sample actually 
exists in the population and is not simply the result of chance. In very 
general terms, the significance of a relationship is determined by com- 
paring the results or findings with what would occur if the variables were 
totally unrelated (independent) and if the distributions of each dependent 
variable were identical. The primary index of statistical significance is the 
p- value. The ^-value represents the probability of chance error in deter- 
mining whether a finding is valid and thus representative of the popula- 
tion. For example, if we were examining the correlation between two vari- 
ables, ajft-value of .05 would indicate that there was a 5% probability that 
the finding might have been a fluke. Therefore, assuming that there was 
no such relationship between those variables whatsoever, we could ex- 
pect to find a similar result, by chance, about 5 times out of 100. In other 
words, significance levels inform us about the degree of confidence that 
we can have in our findings. 

There is a wide selection of correlations that, for the most part, are de- 
termined by the type of scale (i.e., nominal, ordinal, interval, or ratio) on 
which the variables are measured. One of the most widely used correla- 
tions is the Pearson product-moment correlation, often referred to as the 
Pearson r. The Pearson r is used to examine associations between two vari- 
ables that are measured on either ratio or interval scales. For example, the 
Pearson r could be used to examine the correlation between days of exer- 
cise and pounds of weight loss. 

Other types of correlations include the following: 

• Point-biserial (r bi ): This is used to examine the relationship be- 
tween a variable measured on a naturally occurring dichotomous 
nominal scale and a variable measured on an interval (or ratio) 
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scale (e.g., a correlation between gender [dichotomous] and SAT 
scores [interval]). 

Spearman rank-order (r ): This is used to examine the relationship 
between two variables measured on ordinal scales (e.g., a correla- 
tion of class rank [ordinal] and socioeconomic status [ordinal]). 
Phi (O): This is used to examine the relationship between two 
variables that are naturally dichotomous (nominal-dichotomous; 
e.g., a correlation of gender [nominal] and marital status 
[nominal-dichotomous]) . 

Gamma (y): This is used to examine the relationship between one 
nominal variable and one variable measured on an ordinal scale 
(e.g., a correlation of ethnicity [nominal] and socioeconomic sta- 
tus [ordinal]). 



Inferential Statistics 

In the previous section, we provided a general overview of the most widely 
used descriptive statistics, including measures of central tendency, disper- 
sion, and correlation. In addition to describing and examining associations 
of variables within our data sets, we often conduct research to answer 
questions about the greater population. Because it would not be feasible 
to collect data from the entire population, researchers conduct research 
with representative samples (see Chapters 2 and 3) in an attempt to draw 
inferences about the populations from which the samples were drawn. 
The analyses used to examine these inferences are appropriately referred 
to as inferential statistics. 

Inferential statistics help us to draw conclusions beyond our immediate 
samples and data. For example, inferential statistics could be used to infer, 
from a relatively small sample of employees, what the job satisfaction is 
likely to be for a company's entire work force. Similarly, inferential statis- 
tics could be used to infer, from between-group differences in a particular 
study sample, how effective a new treatment or medication may be for a 
larger population. In other words, inferential statistics help us to drawgen- 
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eral conclusions about the population on the basis of the findings identi- 
fied in a sample. However, as with any generalization, there is some degree 
of uncertainty or error that must be considered. Fortunately, inferential 
statistics provide us with not only the means to make inferences, but the 
means to specify the amount of probable error as well. 

Inferential statistics typically require random sampling. As discussed in 
Chapters 2 and 3, this increases the likelihood that a sample, and the data 
that it generates, are representative of the population. Although there are 
other techniques for acquiring a representative sample (e.g., selecting in- 
dividuals that match the population on the most important characteris- 
tics), random sampling is considered to be the best method, because it 
works to ensure representativeness on all characteristics of the popula- 
tion — even those that the researcher may not have considered. 

Inferences begin with the formulation of specific hypotheses about what 
we expect to be true in the population. However, as discussed in Chapter 2, 
we can never actually prove a hypothesis with complete certainty. Therefore, 
we must test the null hypothesis, and determine whether it should be re- 
tained or rejected. For example, in a randomized controlled trial (see Chap- 
ter 5), we may expect, based on prior research, that a group receiving a 
certain treatment would have better outcomes than a group receiving a 
standard treatment. In this case, the null hypothesis would predict no 
between -group differences. Similarly, in the case of correlation, the null hy- 
pothesis would predict that the variables in question would not be related. 

There are numerous inferential statistics for researchers to choose 
from. The selection of the appropriate statistics is largely determined by 
the nature of the research question being asked and the types of variables 
being analyzed. Because a comprehensive review of inferential statistics 
could fill many volumes of text, we will simply provide a basic overview of 
several of the most widely used inferential statistical procedures, including 
the /-test, analysis of variance (ANOVA), chi-square, and regression. 

T-Test 

2"-tests are used to test mean differences between two groups. In general, 
they require a single dichotomous independent variable (e.g., an experi- 
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mental and a control group) and a single continuous dependent variable. 
For example, /-tests can be used to test for mean differences between ex- 
perimental and control groups in a randomized experiment, or to test for 
mean differences between two groups in a nonexperimental context (such 
as whether cocaine and heroin users report more criminal activity). When 
a researcher wishes to compare the average (mean) performance between 
two groups on a continuous variable, he or she should consider the /-test. 

Analysts of Variance (ANOVA) 

Often characterized as an omnibus t-test, an ANOVA is also a test of mean 
comparisons. In fact, one of the only differences between a /-test and an 
ANOVA is that the ANOVA can compare means across more than two 
groups or conditions. Therefore, a /-test is just a special case of ANOVA. 
If you analyze the means of two groups by ANOVA, you get the same re- 
sults as doing it with a /-test. Although a researcher could use a series of 
/-tests to examine the differences between more than two groups, this 
would not only be less efficient, but it would add experiment-wise error 
(see Rapid Reference 7.4), thereby increasing the chances of spurious re- 
sults (i.e., Type I errors; see Chapter 1) and compromising statistical con- 
clusion validity. 

Interestingly, despite its name, the ANOVA works by comparing the 
differences between group means rather than the differences between 
group variances. The name "analysis of variance" comes from the way the 
procedure uses variances to decide whether the means are different. 

There are numerous different variations of the ANOVA procedure to 
choose from, depending on the study hypothesis and research design. For 
example, a one-way ANOVA is used to compare the means of two or more 
levels of a single independent variable. So, we may use an ANOVA to 
examine the differential effects of three types of treatment on level of 
depression. 



Treatment for Depression 



Treatment 1 Treatment 2 Treatment 3 
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^flap/a 'Reference 7.4 



Multiple Comparisons and Experiment-wise Error 

Most research studies perform many tests of their hypotheses. For ex- 
ample, a researcher testing a new educational technique may choose to 
examine the technique's effectiveness by measuring students' test scores, 
satisfaction ratings, class grades, and SAT scores. If there is a 5% chance 
(with a p-value of .05) of finding a significant result on one outcome mea- 
sure, there is a 20% chance (.05 x 4) of finding a significant result when 
using four outcome measures.This inflated likelihood of achieving a signifi- 
cant result is referred to as experiment-wise erroriThis can be corrected 
for either by using a statistical test that takes this error into account (e.g., 
multiple ANOVA, or MANOVA; see text) or by lowering the p-value to 
account for the number of comparisons being performed. The simplest 
and the most conservative method of controlling for experiment-wise er- 
ror is the Bonferroni correction. Using this correction, the researcher simply 
divides the set p-value by the number of statistical comparisons being 
made (e.g., .05/4 = .0 1 25). The resulting p-value is then the new criterion 
that must be obtained to reach statistical significance. 



Alternatively, multif actor AN OVAs can be used when a study involves 
two or more independent variables. For example, a researcher might em- 
ploy a 2 X 3 factorial design (see Chapter 5) to examine the effectiveness 
of the different treatments (Factor 1) and high or low levels of physical ex- 
ercise (Factor 2) in reducing symptoms of depression. 
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Because the study involves two factors (or independent variables), the 
researcher would conduct a two-way ANOVA. Similarly, if the study had 
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three factors, a three-way AN OVA would be used, and so forth. A multi- 
factor ANOVA allows a researcher to examine not only the main effects 
of each independent variable (the different treatments and high or low lev- 
els of exercise) on depression, but also the potential interaction of the two 
independent variables in combination. 

Still another variant of the ANOVA is the multiple analysis of variance, or 
MANOVA. The MANOVA is used when there are two or more depen- 
dent variables that are generally related in some way. Using the previous 
example, let's say that we were measuring the effect of the different treat- 
ments, with or without exercise, on depression measured in several differ- 
ent ways. Although we could conduct separate ANOVAs for each of these 
outcomes, the MANOVA provides a more efficient and more informative 
way of analyzing the data. 

Chi-Square (% 2 ) 

The inferential statistics that we have discussed so far (i.e., /-tests, 
ANOVA) are appropriate only when the dependent variables being mea- 
sured are continuous (interval or ratio). In contrast, the chi-square statistic 
allows us to test hypotheses using nominal or ordinal data. It does this 
by testing whether one set of proportions is higher or lower than you 
would expect by chance. Chi-square summarizes the discrepancy between 
observed and expected frequencies. The smaller the overall discrepancy 
is between the observed and expected scores, the smaller the value of 
the chi-square will be. Conversely, the larger the discrepancy is between 
the observed and expected scores, the larger the value of the chi-square 
will be. 

For example, in a study of employment skills, a researcher may ran- 
domly assign consenting individuals to an experimental or a standard 
skills-training intervention. The researcher might hypothesize that a 
higher percentage of participants who attended the experimental inter- 
vention would be employed at 1 year follow-up. Because the outcome be- 
ing measured is dichotomous (employed or not employed), the researcher 
could use a chi-square to test the null hypothesis that employment at the 
1 year follow-up is not related to the skills training. 

term LinG - live, informative, Non-cost and Genuine \ 



224 ESSENTIALS OF RESEARCH DESIGN AND METHODOLOGY 



Similarly, chi-square analysis is often used to examine between-group 
differences on categorical variables, such as gender, marital status, or 
grade level. The main thing to remember is that the data must be nominal 
or ordinal because chi-square is a test of proportions. Also, because it 
compares the tallies of categorical responses between two or more groups, 
the chi square statistic can be conducted only on actual numbers and not 
on precalculated percentages or proportions. 

Regression 

Linear regression is a method of estimating or predicting a value on some de- 
pendent variable given the values of one or more independent variables. 
Like correlations, statistical regression examines the association or rela- 
tionship between variables. Unlike with correlations, however, the pri- 
mary purpose of regression is prediction. For example, insurance ad- 
justers may be able to predict or come close to predicting a person's life 
span from his or her current age, body weight, medical history, history of 
tobacco use, marital status, and current behavioral patterns. 

There are two basic types of regression analysis: simple regression and 
multiple regression. In simple regression, we attempt to predict the depen- 
dent variable with a single independent variable. In multiple regression, as in 
the case of the insurance adjuster, we may use any number of independent 
variables to predict the dependent variable. 

Logistic regression, unlike its linear counterpart, is unique in its ability to 
predict dichotomous variables, such as the presence or absence of a spe- 
cific outcome, based on a specific set of independent or predictor vari- 
ables. Like correlation, logistic regression provides information about the 
strength and direction of the association between the variables. In addi- 
tion, logistic regression coefficients can be used to estimate odds ratios for 
each of the independent variables in the model. These odds ratios can tell us 
how likely a dichotomous outcome is to occur given a particular set of in- 
dependent variables. 

A common application of logistic regression is to determine whether 
and to what degree a set of hypothesized risk factors might predict the on- 
set of a certain condition. For example, a drug abuse researcher may wish 
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to determine whether certain lifestyle and behavioral patterns place for- 
mer drug abusers at risk for relapse. The researcher may hypothesize that 
three specific factors — living with a drug or alcohol user, psychiatric sta- 
tus, and employment status — will predict whether a former drug abuser 
will relapse within 1 month of completing drug treatment. By measuring 
these variables in a sample of successful drug-treatment clients, the re- 
searcher could build a model to predict whether they will have relapsed by 
the 1 -month follow-up assessment. The model could also be used to esti- 
mate the odds ratios for each variable. For example, the odds ratios could 
provide information on how much more likely unemployed individuals 
are to relapse than employed individuals. 

INTERPRETING DATA AND DRAWING INFERENCES 

Even researchers who carefully planned their studies and collected, man- 
aged, and analyzed their data with the highest integrity might still make 
mistakes when interpreting their data. Unfortunately, although all of the 
previous steps are necessary, they are far from sufficient to ensure that the 
moral of the story is accurately understood and disseminated. This section 
will highlight some of the most critical issues to consider when interpret- 
ing data and drawing inferences from your findings. 

Are You Fully Powered? 

One of the ways that study findings can be misinterpreted is through in- 
sufficient statistical power. Until fairly recently, most research studies were 
conducted without any consideration of this concept. In simple terms, sta- 
tistical power'vs, a measure of the probability that a statistical test will reject a 
false null hypothesis, or in other words, the probability of finding a signif- 
icant result when there really is one. The higher the power of a statistical 
test, the more likely one is to find statistical significance if the null hy- 
pothesis is actually false (i.e., if there really is an effect). 

For example, to test the null hypothesis that Republicans are as intelli- 
gent as Democrats, a researcher might recruit a random bipartisan sample, 
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have them complete certain measures of intelligence, and compare their 
mean scores using a /-test or ANOVA. If Republicans and Democrats do 
indeed differ on intelligence in the population, but the sample data indi- 
cate that they do not, a Type II error has been made (see Chapter 1 for a 
discussion of Type I and Type II errors). A potential reason that the study 
reached such a faulty conclusion may be that it lacked sufficient statistical 
power to detect the actual differences between Republicans and Democ- 
rats. 

According to Cohen (1988), studies should strive for statistical power 
of .80 or greater to avoid Type II errors. Statistical power is largely deter- 
mined by three factors: (1) the significance criterion (e.g., .05, .01); (2) the 
effect si%e (i.e., the magnitude of the differences between group means or 
other test statistics); and (3) the size of the sample. Researchers should cal- 
culate the statistical power of each of their planned analyses prior to be- 
ginning a study. This will allow them to determine the sample size neces- 
sary to obtain sufficient power (> .80) based on the set significance 
criterion and the anticipated effect size. 

Unfortunately, determining that there is enough power at the outset of 
a study does not always ensure that sufficient power will be available at the 
time of the analysis. Many changes may occur in the interim. For example, 
the sample size may be reduced, due to lower than expected recruitment 
rates or attrition; or the effect sizes may be different than expected. In any 
case, the take-home message for researchers is that they must always con- 
sider how much power is available to detect differences between groups. 
This is particularly important when interpreting the results of a study in 
which no significant differences were found, because it may be that sig- 
nificant differences existed, but there was insufficient power to detect 
them. 

Are Your Distributions in Good Shape? 

Another factor that can lead to faulty interpretations of statistical findings 
is the failure to consider the characteristics of the distribution. Virtually all 
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statistical tests have certain basic 

assumptions. For example, para- ^= nUp/u nc/c/'cflCc /.J 

metric tests (e.g., /-tests, AN OVA, 

linear regression) require that the Ko business ot 

distribution of data meet certain Statistical Tests 

requirements (i.e., normality and Robustness of a statistical test 

independence). Failure to meet refers to the degree to which it is 

.1 .: .1 resistant to violations of certain 

these assumptions may cause the 

assumptions. The robustness of 
results of an analysis to be inaccu- certam statistica , techniques does 

rate. Although statistics such as not mean they are totally immune 

the /-test and ANOVA are consid- to such violations, but merely that 

i i . ; i i . / d j they are less sensitive to them, 
ered relatively robust (see Rapid ' 

Reference 7.5) in terms of their 

sensitivity to normality, this is less true for the assumption of indepen- 
dence. For example, if a researcher were comparing the effect of two dif- 
ferent teachers' methods on students' final grades, the researcher would 
have to make certain that none of the students had classes with both 
teachers. If certain students had classes with both teachers, and were 
therefore exposed to both teaching methods, the assumption of indepen- 
dence would have been violated. Because of this, probability statements 
regarding Type I and Type II errors may be seriously affected. 

Another aspect of the distribution that should be considered when in- 
terpreting study findings is data outliers. As discussed earlier, extreme val- 
ues in the distribution can substantially skew the shape of the distribution 
and alter the sample mean. Researchers should carefully examine the dis- 
tributions of their data to identify potential outliers. Once identified, out- 
liers can be either replaced with missing values or transformed through 
one of several available procedures (discussed previously in this chapter). 

Still another aspect of the distribution that should be considered when 
analyzing and interpreting data is the range of values. Researchers often 
fail to find significant relationships because of the restricted range or vari- 
ance of a dependent variable. For example, suppose you were examining 
the relationship between IQ and SAT scores, but everyone in the sample 
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scored between 1100 and 1200 on their SATs. In this case, because of the 
restricted range, you would be unlikely to find a significant relationship, 
even if one did exist in the population. 

Are You Fishing? 

Although we covered the issue of multiple comparisons and experiment- 
wise error earlier in this chapter, it deserves additional mention here be- 
cause it can seriously impact the interpretation of your findings. In general, 
experiment-wise error refers to the probability of committing Type I errors 
for a set of statistical tests in the same experiment. When you make many 
comparisons involving the same data, the probability that one of the com- 
parisons will be statistically significant increases. Thus, experiment-wise 
error may exceed a chosen significance level. If you make enough com- 
parisons, one or some of the results will undoubtedly be significant. Col- 
loquially, this is often referred to as "fishing," because if you cast out your 
line enough times you are bound to catch something. Although this may 
be a good strategy for anglers, in research it is just bad science. This issue 
is most likely to occur when examining complex hypotheses that require 
many different comparisons. Failing to correct for these multiple compar- 
isons can lead to substantial Type I error and to faulty interpretations of 
your findings. 

How Reliable andValid Are Your Measures? 

Another major factor that can affect a study's findings is measurement er- 
ror. Although most statistical analyses, and many of the researchers who 
conduct them, assume that assessment instruments are error free, this is 
usually far from the truth. In fact, assessment instruments are rarely, if 
ever, perfect (see Chapter 4 for a detailed discussion of this topic). This 
is particularly true when using unstandardized measures that may vary in 
their administration procedures, or when using instruments that have little 
if any demonstrated validity or reliability (see Chapter 6). For these rea- 
sons, it is essential that researchers, whenever possible, use psychometri- 
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cally sound instruments in their studies. Using error-laden instruments 
may substantially reduce the sensitivity of your analyses and obscure oth- 
erwise significant findings. 

Statistical Significance vs. Clinical Significance 

Because of the technical and detailed nature of the research enterprise, it 
is often easy to miss the forest for the trees. Researchers can get so caught 
up in the rigor of data collection, management, and analysis that they may 
wind up believing that the final value of a research study lies in itsp-value. 
This is, of course, far from the truth. The real value of a research finding 
lies in its clinical significance, not in its statistical significance. In other words, 
will the researching findings affect how things are done in the real world? 

This is not to say that statistical significance is irrelevant. On the con- 
trary, statistical significance is essential in determining how likely a result 
is to be true or due to chance. Before we can decide on the clinical signif- 
icance of a finding, we must be somewhat certain that the finding is indeed 
valid. The misperception instead lies in the belief that statistical signifi- 
cance itself is meaningful. In fact, study results can be statistically signifi- 
cant, but clinically meaningless. 

To interpret the clinical significance of their findings, researchers might 
examine a number of other indices, such as the effect size or the percent- 
age of participants who moved from outside a normal range to within a 
normal range. For example, a study may reveal that two different studying 
methods lead to significantly different test scores, but that neither method 
results in passing scores. When interpreting research findings, researchers 
should consider not only the statistical significance, but its clinical, or real- 
world, importance. 

Are There Alternative Explanations? 

As we discussed in Chapter 5, the key element in true experimental re- 
search is scientific control and the ability to rule out alternative explana- 
tions. In Chapter 5, we noted that randomization is the best way to achieve 
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this type of control. This point cannot be overemphasized. Unless you can 
be relatively certain that there are no systematic differences between the 
experimental groups or conditions, and that the only thing that varies is 
the independent variable that you are manipulating, you simply cannot 
rule out other potential explanations for your findings. 

Even in randomized trials, there is a chance, however small, that there 
are between-group differences on variables other than the one that you are 
manipulating. The wise researcher should always view his or her findings 
with some degree of suspicion and always consider alternative explana- 
tions for those findings. It is this critical analysis and inability to be easily 
convinced that distinguishes true scientific endeavors from lesser pursuits. 

Are You Confusing Correlation With Causation? 

We know that we already apologized for saying this too often, but here we 
go again: Correlation is not causation, period. Significant or not, hypothe- 
sized or not, large-magnitude associations or not, simple measures of as- 
sociation should never be interpreted as demonstrating causal relation- 
ships. Where would we be if we accepted such faulty logic? We would 
probably be in a society that believes cold temperatures cause colds, or 
that rock music leads to drug abuse. Okay, so maybe we are not always so 
literal. However, the thing that sets scientists apart from laypeople (other 
than our low incomes) is our knowledge of the scientific method and our 
ability to discriminate between assumption and fact (see Chapter 1 for a 
discussion of the scientific method). 

The bottom line about causality is that it cannot be inferred without 
random assignment. In other words, the researcher must be the one who 
selects and manipulates the independent variables, and this must be done 
prospectively. If this is not the case, you may find a significant association 
between variables, but you simply cannot infer causation. Importantly, this 
is true regardless of the statistical tests that are used. It does not matter 
whether you used a linear regression, an ANOVA, or an even more so- 
phisticated statistical technique. Unless randomization and control are 
employed, causation cannot be inferred. 
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How Significant IsYour Nonsignificance? 

The last point that we want to cover with regard to the interpretation of 
study results is the issue of nonsignificance. As a general guideline, re- 
searchers should not be overly invested in finding a specific outcome. That 
is, even though they may have strong rationales for hypothesizing partic- 
ular results, they should not place all their hopes on having their studies 
turn out as they may have expected. Not only could such an approach pre- 
cipitate bias, but it could lead to a common misperception among research 
scientists — namely, that nonsignificant results are not useful. On the con- 
trary, nonsignificant findings can be as important, if not more important, 
than significant ones. 

The furtherance of science depends on the empirical evaluation of 
widely held assumptions and what many consider to be common sense. 
The furtherance of science also depends on attempts to replicate research 
findings and to determine whether findings found in one population gen- 
eralize to other populations. In any of these cases, nonsignificant findings 
can have some very significant (important) implications. Therefore, it is 
strongly recommended that researchers be as neutral and objective as pos- 
sible when analyzing and interpreting their results. In many cases, less may, 
in fact, be more. 



C A CJ T I l\ 



SUMMARY 

In this chapter, we have reviewed 
some of the major objectives and 
techniques involved in the prepa- 
ration, analysis, and interpretation 
of study data. In the first section, 
we discussed the importance of 
properly logging and screening 
data, designing a well-structured 
database and codebook, and 
transforming variables into an ef- 



Publication Bias 

A number of studies (e.g., loanni- 
dis, 1 998; Sterns & Simes, 1997) 
have found a connection between 
the significance of a study's find- 
ings and its publishibility. Specifi- 
cally, these researchers have found 
that a greater percentage of stud- 
ies that report significant findings 
wind up being published and that 
there are also greater publication 
delays for such studies. 
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ficient and analyzable form. In the second section, we covered the two pri- 
mary categories of statistical analyses — descriptive and inferential — and 
provided a brief overview of several of the most widely used analytic tech- 
niques. In the last section, we presented a wide range of issues that re- 
searchers should consider when interpreting their research findings. 
Specifically, we sought to express the potential influence that issues such 
as power, statistical assumptions, multiple comparisons, measurement er- 
ror, clinical significance, alternative explanations, and inferences about 
causality can have on the way that you interpret your data. 



Jjfr TEST YOURSELF ^u 



A written or computerized record that provides a clear and comprehen- 
sive description of all variables entered into a database is known as a 



2. statistics are generally used to accurately characterize the 

data collected from a study sample. 

3. A graph that illustrates the frequency of observations by groups is known 
as a . 

4. A measure of the spread of values around the mean of a distribution is 
known as the . 

5. Analysis of variance (ANOVA) is used to measure differences in group 



Answers: I . data codebook; 2. Descriptive; 3. histogram; 4. standard deviation; 5. means 
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ETHICAL CONSIDERATIONS IN RESEARCH 



In the previous chapters, we reviewed many of the methodological 
issues that should be considered when conducting research. We dis- 
cussed how researchers should begin their research endeavors by gen- 
erating relevant questions, formulating clear and testable hypotheses, and 
selecting appropriate and practical research designs. By adhering to the 
scientific method, researchers can, in due course, obtain valid and reliable 
findings that may advance scientific knowledge. 

Unavoidably, however, to advance knowledge in this manner it is often 
necessary to impinge upon the rights of individuals. Virtually all studies 
with human participants involve some degree of risk. These risks may 
range from minor discomfort or embarrassment caused by somewhat in- 
trusive or provocative questions (e.g., questions about sexual practices, 
drug and alcohol use) to much more severe effects on participants' physi- 
cal or emotional well-being. These risks present researchers with an ethi- 
cal dilemma regarding the degree to which participants should be placed 
at risk in the name of scientific progress. 

A number of ethical codes have been developed to provide guidance 
and establish principles to address such ethical dilemmas. These codes in- 
clude federally mandated regulations promulgated by the U.S. Depart- 
ment of Health and Human Services (Title 45, Part 46 of the Code of Fed- 
eral Regulations), as well as those developed for specific fields of study, such 
as the APA's Ethical Principles of Psychologists and Code of Conduct (2002). 
These codified principles are intended to ensure that researchers consider 
all potential risks and ethical conflicts when designing and conducting re- 
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search. Moreover, these principles are intended to protect research partic- 
ipants from harm (Sieber & Stanley, 1988). 

To help the reader better contextualize and appreciate the importance 
of the protection of research participants, this chapter will begin by re- 
viewing the historical evolution of research ethics. We will then discuss the 
fundamental ethical principles of respect for persons, beneficence, and 
justice, which serve as the foundation for the formal protection of re- 
search participants. Finally, we will review two of the most essential pro- 
cesses in the protection of research participants: informed consent and 
the institutional review board. The purpose of this chapter is to familiar- 
ize the reader with some of the most common ethical issues in research 
with human participants, and it should not be considered a comprehen- 
sive review of all ethical principles and regulatory and legal guidelines and 
requirements. Before researchers undertake any study involving human 
participants, they should consult the specific rules of their institutions, the 
requirements of their institutional review boards, and applicable federal 
regulations, including Title 45, Part 46 of the Code of Federal Regulations. 

HISTORICAL BACKGROUND 

Many of the most significant medical and behavioral advancements of the 
20th century, including vaccines for diseases such as smallpox and polio, 
required years of research and testing, much of which was done with 
human participants. Regrettably, however, many of these well-known ad- 
vancements have somewhat sinister histories, as they were made at the ex- 
pense of vulnerable populations such as inpatient psychiatric patients and 
prisoners, as well as noninstitutionalized minorities. In fact, a large pro- 
portion of these study participants were involved in clinical research with- 
out ever being informed. Revelations about Nazi medical experiments and 
unethical studies conducted within the United States (e.g., the Tuskegee 
Syphilis Study — see Rapid Reference 8.1; Milgram's Obedience and Indi- 
vidual Responsibility Study [Milgram, 1974]; human radiation experi- 
ments) heightened public awareness about the potential for and often 
tragic consequences of research misconduct. 
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The Tuskegee Syphilis Study 

In 1 932, the U.S. Public Health Service began a 40-year longitudinal study 
to examine the natural course of untreated syphilis. Four hundred Black 
men living inTuskegee, Alabama, who had syphilis were compared to 200 
uninfected men. Participants were recruited with the promise that they 
would receive "special treatment" for their "bad blood." Horrifyingly, gov- 
ernment officials went to extreme lengths to ensure that the participants 
in fact received no therapy from any source. The "special treatment" that 
was promised was actually very painful spinal taps, performed without 
anesthesia — not as a treatment, but merely to evaluate the neurological 
effects of syphilis. Moreover, even though penicillin was identified as an ef- 
fective treatment for syphilis as early as the 1 940s, the 400 infected men 
were never informed about ortreated with the medication. By 1 972, 
when public revelations and outcry forced the government to end the 
study, only 74 of the original 400 infected participants were still alive. Fur- 
ther examination revealed that somewhere between 28 and 1 00 of these 
participants had died as a direct result of their infections. 



Over the past half-century, the international and U.S. medical commu- 
nities have taken a number of steps to protect individuals who participate 
in research studies. Developed in response to the Nuremberg Trials of 
Nazi doctors who performed unethical experimentation during World 
War II, the Nuremberg Code (see Rapid Reference 8.2) was the first ma- 
jor international document to provide guidelines on research ethics. It 
made voluntary consent a requirement in clinical research studies, empha- 
sizing that consent can be voluntary only under the following conditions: 

1. Participants are able to consent. 

2. They are free from coercion (i.e., outside pressure). 

3. They comprehend the risks and benefits involved. 

The Nuremberg Code also clearly requires that researchers should min- 
imize risk and harm, ensure that risks do not significantly outweigh po- 
tential benefits, use appropriate study designs, and guarantee participants' 
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The Nuremberg Code 

1 . The voluntary consent of the human subject is absolutely essential. 

2. The experiment should be such as to yield fruitful results forthe 
good of society unprocurable by other methods or means of study 
and not random and unnecessary in nature. 

3. The experiment should be so designed and based on the results of 
animal experimentation and a knowledge of the natural history of the 
disease or other problem under study that the anticipated results will 
justify the performance of the experiment. 

4. The experiment should be so conducted as to avoid all unnecessary 
physical and mental suffering and injury. 

5. No experiment should be conducted, where there is an a priori rea- 
son to believe that death or disabling injury will occur; except, per- 
haps, in those experiments where the experimental physicians also 
serve as subjects. 

6. The degree of risk to be taken should never exceed that determined 
by the humanitarian importance of the problem to be solved by the 
experiment. 

7. Proper preparations should be made and adequate facilities provided 
to protect the experimental subject against even remote possibilities 
of injury, disability, or death. 

8. The experiment should be conducted only by scientifically qualified 
persons. The highest degree of skill and care should be required 
through all stages of the experiment of those who conduct or engage 
in the experiment. 

9. During the course of the experiment, the human subject should be at 
liberty to bring the experiment to an end, if he has reached the physi- 
cal or mental state, where continuation of the experiment seemed to 
him to be impossible. 

1 0. During the course of the experiment, the scientist in charge must be 
prepared to terminate the experiment at any stage, if he has probable 
cause to believe, in the exercise of the good faith, superior skill and 
careful judgment required of him, that a continuation of the experi- 
ment is likely to result in injury, disability, or death to the experimental 
subject. 

Source:Trials of War Criminals Before the Nuremberg Military Tribunals Under Control Council 
Law No. 10.(1 949).Vol. 2, pp. I 8 I - 1 82. Washington, D.C.: U.S. Government Printing Office. 
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freedom to withdraw at any time. The Nuremberg Code was adopted by 
the United Nations General Assembly in 1948. 

The next major development in the protection of research participants 
came in 1964 at the 18th World Medical Assembly in Helsinki, Finland. 
With the establishment of the Helsinki Declaration, the World Medical 
Association adopted 12 principles to guide physicians on ethical consid- 
erations related to biomedical research. Among its many contributions, 
the declaration helped to clarify the very important distinction between 
medical treatment, which is provided to directly benefit the patient, and med- 
ical research, which may or may not provide a direct benefit. The declaration 
also recommended that human biomedical research adhere to accepted 
scientific principles and be based on scientifically valid and rigorous labo- 
ratory and animal experimentation, as well as on a thorough knowledge of 
scientific literature. These guidelines were revised at subsequent meetings 
in 1975, 1983, and 1989. 

In 1974, largely in response to the Tuskegee Syphilis Study, the U.S. 
Congress passed the National Research Act, creating the National Com- 
mission for the Protection of Human Subjects of Biomedical and Behav- 
ioral Research. The National Research Act led to the development of in- 
stitutional review boards (IRBs). These review boards, which we will describe 
in detail later, are specific human-subjects committees that review and de- 
termine the ethicality of research. The National Research Act required 
IRB review and approval of all federally funded research involving human 
participants. The Commission was responsible for (1) identifying the eth- 
ical principles that should govern research involving human participants 
and (2) recommending steps to improve the Regulations for the Protec- 
tion of Human Subjects. 

In 1979, the National Commission for the Protection of Human Sub- 
jects of Biomedical and Behavioral Research issued "The Belmont Report: 
Ethical Principles and Guidelines for the Protection of Human Subjects 
of Research." The Belmont Report established three principles that un- 
derlie the ethical conduct of all research conducted with human partici- 
pants: (1) respect for persons, (2) beneficence, and (3) justice (see Rapid 
Reference 8.3). 
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The Belmont Report: Summary of Basic Principles 

1 . Respect for Persons 

Respect for persons incorporates at least two ethical convictions: first, 
that individuals should be treated as autonomous agents, and second, that 
persons with diminished autonomy are entitled to protection. The prin- 
ciple of respect for persons thus divides into two separate moral require- 
ments: the requirement to acknowledge autonomy, and the requirement 
to protect those with diminished autonomy. 

2. Beneficence 

Persons are treated in an ethical manner not only by respecting their deci- 
sions and protecting them from harm, but also by making efforts to se- 
cure their well-being. Such treatment falls under the principle of benefi- 
cence. The term "beneficence" is often understood to cover acts of 
kindness or charity that go beyond strict obligation. In this document, 
beneficence is understood in a stronger sense, as an obligation. Two gen- 
eral rules have been formulated as complementary expressions of benefi- 
cent actions in this sense: ( I ) do not harm, and (2) maximize possible 
benefits, and minimize possible harms. 

3. Justice 

Who ought to receive the benefits of research and bear its burdens? This 
is a question of justice, in the sense of "fairness in distribution" or "what is 
deserved." An injustice occurs when some benefit to which a person is 
entitled is denied without good reason, or when some burden is imposed 
unduly. Another way of conceiving the principle of justice is that equals 
ought to be treated equally. However, this statement requires explication. 
Who is equal and who is unequal? What considerations justify departure 
from equal distribution? Almost all commentators allow that distinctions 
based on experience, age, deprivation, competence, merit, and position 
do sometimes constitute criteria justifying differential treatment for cer- 
tain purposes. It is necessary, then, to explain in what respects people 
should be treated equally.There are several widely accepted formulations 
of just ways to distribute burdens and benefits. Each formulation mentions 
some relevant property, on the basis of which burdens and benefits 
should be distributed. These formulations are (I) to each person an equal 
share, (2) to each person according to individual need, (3) to each person 
according to individual effort, (4) to each person according to societal 
contribution, and (5) to each person according to merit. 
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The Belmont Report explains how these principles apply to research 
practices. For example, it identifies informed consent as a process that is 
essential to the principle of respect. In response to the Belmont Report, 
both the U.S. Department of Health and Human Services and the U.S. 
Food and Drug Administration revised their regulations on research stud- 
ies that involve human participants. 

In 1994, largely in response to information about 1940s experiments 
involving the injection of research participants with plutonium as well as 
other radiation experiments conducted on indigent patients and children 
with mental retardation (see Rapid Reference 8.4), President Clinton cre- 
ated the National Bioethics Advisory Commission (NBAC). Since its in- 



ftap/'o 'Reference SJ 



Human Radiation Experiments 

President William J. Clinton formed the Advisory Committee on Human 
Radiation Experiments in 1 994 to uncover the history of human radia- 
tion experiments. According to the committee's final report, several 
agencies of the United States government, including the Atomic Energy 
Commission, and several branches of the military services, conducted or 
sponsored thousands of human radiation experiments and several hun- 
dred intentional releases of radiation between the years of 1946 and 
1 974. Among the committee's harshest criticisms was that physicians 
used patients without their consent in experiments in which the patients 
could not possibly benefit medically.The principal purpose of these ex- 
periments was ostensibly to help atomic scientists understand the poten- 
tial dangers of nuclear war and radiation fallout.These experiments were 
conducted in "secret" with the belief that this was necessary to protect 
national security.The committee concluded that the government was 
responsible for failing to implement many of its own protection policies. 
The committee further concluded that individual researchers failed to 
comply with the accepted standards of professional ethics. In October 
1995, after receiving the committee's final report, President Clinton of- 
fered a public apology to the experimental subjects, and in March 1 997, 
he agreed to provide financial compensation to all of the individuals who 
were injured. 
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ception, NBAC has generated a total of 10 reports. These reports have 
served to provide advice and make recommendations to the National Sci- 
ence and Technology Council and to other government entities, and to 
identify broad principles to govern the ethical conduct of research. 

FUNDAMENTAL ETHICAL PRINCIPLES 

The many post-Nuremberg efforts just reviewed have largely defined the 
philosophical and administrative basis for most existing codes of research 
ethics. Although these codes may differ slightly across jurisdictions and 
disciplines, they all emphasize the protection of human participants and, 
as outlined in the Belmont Report, have been established to ensure au- 
tonomy, beneficence, and justice. 

Respect for Persons 

As described in the Belmont Report, "Respect for persons incorporates at 
least two ethical mandates: first, that individuals be treated as autonomous 
agents, and second, that individuals with diminished autonomy are entitled 
to protection" (1979, p. 4). The concept of autonomy, which is clearly integral 
to this principle, means that human beings have the right to decide what 
they want to do and to make their own decisions about the kinds of research 
experiences they want to be involved in, if any. In cases in which one's au- 
tonomy is diminished due to cognitive impairment, illness, or age, the re- 
searcher has an obligation to protect the individual's rights. Respect for per- 
sons therefore serves as the underlying basis for what might be considered 
the most fundamental ethical safeguard underlying research with human 
participants: the requirement that researchers obtain informed consent 
from individuals who freely volunteer to participate in their research. 

Coercion, or forcing someone to participate in research, is antithetical 
to the idea of respect for persons and is clearly unethical. Although there 
are many safeguards in place to ensure that explicit coercion to research, 
such as the research practiced in Nazi concentration camps, is no longer 
likely, there are still many situations in which more subtle or implicit coer- 
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cion may take place. For example, consider a population of prison inmates 
or individuals who have just been arrested. If they are asked to participate 
in a study, is it coercive? It may be, if the prison administrators, judge, or 
other criminal justice staff are who ask them to participate, or if the dis- 
tinction between researchers and criminal justice staff is unclear. In such 
instances, the participants may feel unduly pressured or coerced to partic- 
ipate in the study, fearing negative repercussions if they choose to decline. 
This type of implicit coercion might also occur in any situation in which 
the participant is in a vulnerable position or in which the study recruiter or 
perceived recruiter is in a position of power or authority (e.g., teacher- 
student, employer-employee). 

Importantly, the principle of respect for persons does not mean that 
potentially vulnerable or coercible populations should be prevented from 
participating in research. On the contrary, respect for persons means that 
these individuals should have every right to participate in research if they 
so choose. The main point is that these individuals should be able to make 
this decision autonomously. For these reasons, it is probably good practice 
for researchers to maintain clear boundaries between themselves and per- 
sons who have authority over prospective research participants. 

Beneficence 



icence means being kind, or a charitable act or gift. In the research con- 
text, the ethical principle of beneficence has its origins in the famous edict 
of the Hippocratic Oath, which has been taken by physicians since ancient 
times: "First, do no harm." Above all, researchers should not harm their 
participants and, ultimately, the benefits to their participants should be 
maximized and potential harms and discomforts should be minimized. In 
conducting research, the progress of science should not come at the price 
of harm to research participants. For example, even if the Tuskegee ex- 
periments had resulted in important information on the course of syphilis 
(which remains unclear), the government did not have the right to place 
individuals at risk of harm and death to obtain this information. 

Importantly, the edict "do no harm" is probably more easily adhered to 
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in clinical practice in which clinicians employ well-established and well- 
validated procedures. The potential risks and benefits are typically less 
predictable in the context of research in which new procedures are being 
tested. This poses an important ethical dilemma for researchers. On the 
one hand, the researcher may have a firm basis for believing and hypothe- 
sizing that a specific treatment will be helpful and beneficial. On the other 
hand, because it has not yet been tested, he or she can only speculate about 
the potential harm and side effects that may be associated with the treat- 
ment or intervention. 

To determine whether a research protocol has an acceptable risk/ben- 
efit ratio, the protocol describing all aspects of the research and potential 
alternatives must be reviewed. According to the Belmont Report, there 
should also be close communication between the IRB and the researcher. 
The IRB should (1) determine the validity of the assumptions on which 
the research is based, (2) distinguish the nature of the risk, and (3) deter- 
mine whether the researcher's estimates of the probability of harm or ben- 
efits are reasonable. 

The Belmont Report delineates five rules that should be followed in de- 
termining the risk/benefit ratio of a specific research endeavor (National 
Commission for the Protection of Human Subjects of Biomedical and 
Behavioral Research, 1979, p. 8): 

1. Brutal or inhumane treatment of human subjects is never 
morally justified. 

2. Risks should be reduced to those necessary to achieve the re- 
search objective. It should be determined whether it is in fact 
necessary to use human subjects at all. Risk can perhaps never 
be entirely eliminated, but it can often be reduced by careful 
attention to alternative procedures. 

3. When research involves significant risk of serious impairment, 
review committees should be extraordinarily insistent on the 
justification of the risk (looking usually to the likelihood of ben- 
efit to the subject or, in some rare cases, to the manifest volun- 
tariness of the participation). 
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4. When vulnerable populations are involved in research, the ap- 
propriateness of involving them should itself be demonstrated. 
A number of variables go into such judgments, including the na- 
ture and degree of risk, the condition of the particular popula- 
tion involved, and the nature and level of the anticipated bene- 
fits. 

5. Relevant risks and benefits must be thoroughly arrayed in docu- 
ments and procedures used in the informed consent process. 



Justice 

The principle of justice relates most directly to the researcher's selection 
of research participants. According to the Belmont Report, the selection 
of research participants must be the result of fair selection procedures and 
must also result in fair selection outcomes. The justness of participant se- 
lection relates both to the participant as an individual and to the partici- 
pant as a member of social, racial, sexual, or ethnic groups. Importantly, 
there should be no bias or discrimination in the selection and recruitment 
of research participants. In other words, they should not be selected be- 
cause they are viewed positively or negatively by the researcher (e.g., in- 
volving so-called undesirable persons in risky research). 

In addition to the selection of research participants, the principle of jus- 
tice is also relevant to how research participants are treated, or not treated. 
As we discussed in Chapter 5, the use of control conditions is essential to 
randomized, controlled studies, which is the only true method to confi- 
dently evaluate the effectiveness of a specific treatment or intervention. 
The dilemma here is whether it is ethical or just to assign some participants 
to receive a potentially helpful intervention, and others to not receive it. 
Although this may be less an issue in certain types of research, it is a criti- 
cal issue in medical studies involving treatment for debilitating conditions, 
or in criminal justice or social policy research involving potentially life- 
changing opportunities. One might ask why the researcher could not 
simply ask for volunteers for the control condition. The answer to this 
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question is that participants' awareness of being in a control condition 
may alter the results. It is therefore necessary to blind the participants (i.e., 
to keep participants unaware of their experimental assignments), which 
raises yet another potential ethical dilemma. 

Fortunately, there are several ways to address these ethical concerns. 
First, the research participants must be clearly informed that they will be 
randomly assigned to either an experimental condition or a control condi- 
tion, and they should also be informed of the likelihood (e.g., one in two, 
one in three) of being assigned to one condition or the other. Second, the 
researcher should assure participants that they will receive full disclosure 
regarding their assignment following the completion of the study, and the 
researcher should provide the opportunity to those who had been as- 
signed to the control condition to receive the experimental treatment if it 
is shown to be effective. 



DON'T FORGET 

Confidentiality 

The right to confidentiality is embodied in the principles of respect for 
persons, beneficence, and justice. Generally, confidentiality involves both 
an individual's right to have control overthe use or access of his or her 
personal information as well as the right to have the information that he 
or she shares with the research team kept private. The researcher is 
responsible not only for maintaining the confidentiality of all information 
protected by law, but also for information that might affect the privacy 
and dignity of research participants. During the consent process, the re- 
searcher must clearly explain all issues related to confidentiality, including 
who will have access to their information, the limits of confidentiality, risks 
related to potential breaches of confidentiality, and safeguards designed to 
protect their confidentiality (e.g., plans for data transfer, data storage, and 
recoding and purging data of client identifiers). Researchers should be 
aware of the serious effects that breaches in confidentiality could have on 
the research participants, and employ every safeguard to prevent such 
violations, including careful planning and training of research staff. Re- 
searchers should also familiarize themselves with all applicable institu- 
tional, local, state, and federal regulations governing their research. 
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Federal Research Protections 

There are two primary categories of federal research protections for hu- 
man participants. The first is provided in the Federal Policy forthe Protec- 
tion of Human Subjects, also known as the Common Ru/e. The Common 
Rule is a set of regulations adopted independently by 1 7 federal agencies 
that support or conduct research with human research participants. The 
17 agencies adopted regulations based on the language set forth in Title 
45, Part 46, Subpart A, of the Code of Federal Regulations (CFR). Thus, the 
Common Rule is, for most intents and purposes, Subpart A of the De- 
partment of Health and Human Services' regulations.The second cate- 
gory of federal protections that relates to human research participants is 
the set of rules governing drug, device, and biologies research. These rules 
are administered by the U.S. Food and Drug Administration (FDA). 
Specifically the FDA regulates research involving products regulated by 
the FDA, including research and marketing permits for drugs, biological 
products, and medical devices for human use, regardless of whether fed- 
eral funds are used. 



To ensure that the basic tenets of the Belmont Report were adhered to, 
the federal government, through the Department of Health and Human 
Services, codified a set of research-related regulations. Known as 45 CFR 
46, indicating the specific Title 45 and Part 46 of the Code of Federal Regula- 
tions, the document details the regulations that must be observed when 
conducting research with human participants (see Rapid Reference 8.5). 
In general, the federal regulations focus on two main areas that are inte- 
gral to the protection of human participants: informed consent and insti- 
tutional review boards. 

INFORMED CONSENT 

The principle mechanism for describing the research study to potential 
participants and providing them with the opportunity to make au- 
tonomous and informed decisions regarding whether to participate is in- 
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formed consent. For this reason, informed consent has been characterized as 
the cornerstone of human rights protections. The three basic elements of 
informed consent are that it must be (1) competent, (2) knowing, and (3) 
voluntary. Notably, each of these three prongs may be conceptualized as 
having its own unique source of vulnerability. In the context of research, 
these potential vulnerabilities may be conceptualized as stemming from 
sources that may be intrinsic, extrinsic, or relational (Roberts & Roberts, 
1999): 

1 . Intrinsic vulnerabilities ate. personal characteristics that may limit an 
individual's capacities or freedoms. For instance, an individual 
who is under the influence of a psychoactive substance oris ac- 
tively psychotic might have difficulty comprehending or attend- 
ing to consent information. Such vulnerabilities relate to the 
first prong of informed consent, that of competence (also re- 
ferred to in the literature as "decisional capacity"). Many theo- 
rists have broadly conceptualized competence to include such 
functions as understanding, appreciation, reasoning, and ex- 
pressing a choice (Appelbaum & Grisso, 2001). However, these 
functions are directly related to the legal and ethical concept of 
competence only insofar as they refer to an individual's intrinsic 
capability to engage in these functions. 

2. Extrinsic vulnerabilities are situational factors that may limit the ca- 
pacities or freedoms of the individual. For example, an individ- 
ual who has just been arrested or who is facing sentencing may 
be too anxious or confused, or may be subject to implicit or ex- 
plicit coercion to provide voluntary and informed consent. Such 
extrinsic vulnerabilities may relate either to knowingness or to 
voluntariness to the degree that the situation, not the individ- 
ual's capacity, prevents him or her from making an informed 
and autonomous decision. 

3. Relational vulnerabilities occur as a result of a relationship with 
another individual or set of individuals. For example, a prisoner 
who is asked by the warden to participate in research is unlikely 
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to feel free to decline. Similarly, a terminally ill person recruited 
into a study by a caregiver may confuse the caregiving and re- 
search roles. Relational vulnerabilities typically relate to the third 
prong of the informed consent process, voluntariness. Certain 
relationships may be implicitly coercive or manipulative because 
they may unduly influence the individual's decision. 

Competence 

The presence of cognitive impairment or limited understanding does not 
automatically disqualify individuals from consenting or assenting to re- 
search studies. As discussed, the principle of respect for persons asserts 
that these individuals should have every right to participate in research if 
they so choose. According to federal regulations (45 CFR § 46.1 11 [b]), 
"When some or all of the subjects are likely to be vulnerable to coercion 
or undue influence, such as children, prisoners, pregnant women, mentally 
disabled persons, or economically or educationally disadvantaged persons, 
additional safeguards have been included in the study to protect the rights 
and welfare of these subjects." Therefore, the critical issue is not whether 
they should be allowed to participate, but whether their condition leads to 
an impaired decisional capacity. 

To our knowledge, there has been only one instrument developed 
specifically for this purpose, the MacArthur Competence Assessment 
Tool for Clinical Research (Appelbaum & Grisso, 2001). Developed by 
two of the leading authorities in consent and research ethics, the instru- 
ment provides a semistructured interview format that can be tailored to 
specific research protocols and used to assess and rate the abilities of po- 
tential research participants in four areas that represent part of the stan- 
dard of competence to consent in many jurisdictions. The instrument 
helps to determine the degree to which potential participants (1) under- 
stand the nature of the research and its procedures; (2) appreciate the con- 
sequences of participation; (3) show the ability to consider alternatives, in- 
cluding the option not to participate; and (4) show the ability to make a 
reasoned choice. Although this instrument appears to be appropriate for 
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assessing competence, researchers should make certain to carefully con- 
sult local and institutional regulations before relying solely on this type of 
instrument. Depending on the specific condition of the potential partici- 
pants, researchers may want to engage the services of a specialist (e.g., a 
neurologist, child psychologist) when making competence determina- 
tions. 

Importantly, researchers should not mistakenly interpret potential par- 
ticipants' attentiveness and agreeable comments or behavior as evidence 
of their competence because many cognitively impaired persons retain at- 
tentiveness and social skills. Similarly, performance on brief mental status 
exams should not be considered sufficient to determine competence, al- 
though such information may be helpful in combination with other com- 
petence measures. 

If the potential research participant is determined to be competent to 
provide consent, the researcher should obtain the participant's informed 
consent. If the potential participant is not sufficiently competent, in- 
formed consent should be obtained from his or her caregiver or surrogate 
and assent should be obtained from the participant. 

Knowingness 

It is still not clear whether many research participants actually participate 
knowledgeably in decision making about their research involvement. In 
fact, evidence suggests that participants in clinical research often fail to 
understand or remember much of the information provided in consent 
documents, including information relevant to their autonomy, such as the 
voluntary nature of participation and their right to withdraw from the 
study at any time without negative repercussions. 

Problems with the understanding of both research and treatment pro- 
tocols have been widely reported (e.g., Dunn & Jeste, 2001). Studies indi- 
cate that research participants often lack awareness of being participants 
in a research study, have poor recall of study information, have inadequate 
recall of important risks of the procedures or treatments, lack under- 
standing of randomization procedures and placebo treatments, lack 
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The Therapeutic Misconception 

The therapeutic misconception occurs when research participants confuse 
general intentions of research with those of treatment, or the role of re- 
searchers with the role of clinicians.This misconception refers specifically 
to the mistaken belief that the principle of personal care applies even in 
research settings. This may also be seen as a sort of'white-coat phenome- 
non," in which, as a result of their learning history, individuals may hold on 
to the mistaken belief that any doctor or professional has only their best 
interests in mind. This may compromise their ability to accurately weigh 
the potential risks and benefits of participating in a particular study. 



awareness of the ability to withdraw from the research study at any time, 
and are often confused about the dual roles of clinician versus researcher 
(Appelbaum, Roth, & Lidz, 1982; Cassileth, Zupkis, Sutton-Smith, & 
March, 1980; Sugarman, McCrory, & Hubal, 1998). 

A number of client variables are associated with the understanding of 
consent information. Several studies (e.g., Aaronson et al., 1996; Agre, 
Kurtz, & Krauss, 1994; Bjorn & Holm, 1999) found educational and vo- 
cabulary levels to be significantly and positively correlated with measures 
of understanding of consent information. Although age alone has notbeen 
consistently associated with diminished performance on consent quizzes, 
it does appear to interact with education in that older individuals with less 
education display decreased understanding of consent information (Taub, 
Baker, Kline, & Sturr, 1987). 

Drug and alcohol abusers may present a unique set of difficulties in 
terms of their comprehension and retention of consent information, not 
only because of the mental and physical reactions to the psychoactive sub- 
stances, but also because of the variety of conditions that are comorbid 
with substance abuse (McCrady & Bux, 1999). Acute drug intoxication or 
withdrawal can impair attention, cognition, or retention of important in- 
formation (e.g., Tapert & Brown, 2000). Limited educational opportuni- 
ties, chronic brain changes resulting from long-term drug or alcohol use, 
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prior head trauma, poor nutrition, and comorbid health problems (e.g., 
AIDS-related dementia) are common in individuals with substance abuse 
or dependence diagnoses and may also reduce concentration and limit un- 
derstanding during the informed consent process (McCrady & Bux). 

Although the number of articles published on informed consent has in- 
creased steadily over the past 30 years (Kaufmann, 1983; Sugarman et al., 
1999), the number of studies that have actually tested methods for im- 
proving the informed consent process is quite limited. In their 2001 ar- 
ticle, Dunn and Jeste reviewed a total of 34 experimental studies that had 
examined the effects of interventions designed to increase understanding 
of informed consent information. Of the 34 studies reviewed, 25 found 
that participants' understanding or recall showed improvement using a 
limited array of interventions. The strategies that have proven most suc- 
cessful fall into two broad categories: (1) those focusing on the structure of 
the consent document, and (2) those focusing on the process of presenting 
consent information. Successful strategies directed toward the structure 
of the consent form involved the use of forms that were more highly struc- 
tured, better organized, shorter, and more readable, and that used simpli- 
fied and illustrated formats. Successful strategies involving the consent 
process included corrected feedback and multiple learning trials, and the 
use of summaries of consent information. Other efforts that were gener- 
ally not successful or that showed mixed results included the use of video- 
tape methodologies and the use of highly detailed consent information, 
which were not associated with improved understanding in either a re- 
search or clinical context. 

Other strategies have been shown to help individuals remember con- 
sent information beyond the initial testing period. This has specific im- 
portance in that it speaks to the ability of research participants to retain 
information related to (1) their right to withdraw from the research study 
at any time with no negative consequences, (2) procedures for contacting 
designated individuals in the occasion of an adverse event, and (3) proce- 
dures for obtaining compensation for harm or injury incurred as a result 
of study participation. Successful strategies for improving recall of con- 
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sent information have included making postconsent telephone contacts, 
using simplified and illustrated presentations, and providing corrected 
feedback and multiple learning trials. Still, there is much room for im- 
provement and research should continue to explore methods of improv- 
ing participants' comprehension and retention of consent information. 

Voluntariness 

The issue of whether consent is voluntary is of particular importance 
when conducting research with disenfranchised and vulnerable popula- 
tions, such as individuals involved with the criminal justice system. These 
populations are regularly exposed to implicit and explicit threats of coer- 
cion, deceit, and other kinds of overreaching that may jeopardize the ele- 
ment of voluntariness. In particular, there is a substantial risk that, as a re- 
sult of their current situation, they may become convinced, rightly or 
wrongly, that their future depends on cooperating with authorities. This 
source of vulnerability is very different from knowingness or competence, 
because even the most informed and capable individual may not be able to 
make a truly autonomous decision if he or she is exposed to a potentially 
coercive or compromising situation. 

Despite the obvious importance of this central element of informed 
consent, virtually no studies have examined potential methods for de- 
creasing coercion in research. McGrady and Bux (1 999) surveyed a sample 
of researchers funded by the National Institutes of Health who were cur- 
rently recruiting participants from settings considered to be implicitly 
coercible (e.g., inpatient units, detoxification facilities, prisons). The re- 
searchers were surveyed about the types of procedures they used to ensure 
that participants were free from coercion. Among the most commonly re- 
ported protections were (1) discussing with participants the possibility of 
feeling coerced, (2) obtaining consent from the individuals responsible for 
the participants, (3) changing the compensation to prevent the coercive ef- 
fects of monetary incentives, (4) making clear that treatment is not influ- 
enced by participation in research, (5) reminding participants that partici- 
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pation is voluntary, (6) having participants delay consent to think about 
participation, and (7) providing a clear list of treatment options as an al- 
ternative to research. 



Developing a Consent Form 

Given the importance of informed consent and the many problems re- 
garding its comprehension and retention, researchers should be careful to 
provide consent information to potential research participants or their 
representatives in language that is understandable and clear. Typically, in- 
formed consent must be documented by the use of a written consent form 
approved by the IRB and signed by the participant or the participant's 
legally authorized representative, as well as a witness. One copy should 
then be given to the individual signing the form and another copy should 
be kept by the researcher. The basic elements of a consent form include 
each of the following: 

1. An explanation of the purpose of the study, the number of par- 
ticipants that will be recruited, the reason that they were se- 
lected, the amount of time that they will be involved, their re- 
sponsibilities, and all experimental procedures. 

2. A description of any potential risks to the participant. 

3. A description of any potential benefits to the participant or to 
others that may reasonably be expected from the research. 

4. A description of alternative procedures or interventions, if any, 
that are available and that may be advantageous to the participant. 

5. A statement describing the extent, if any, to which confiden- 
tiality of records identifying the participant will be maintained. 

6. For research involving more than minimal risk, an explanation as 
to whether any compensation will be provided and whether any 
medical treatments are available if injury occurs and, if so, what 
they consist of, or where further information may be obtained. 

7. Information about who can be contacted in the event that par- 
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ticipants require additional information about their rights or 
specific study procedures, or in the event of a research-related 
injury or adverse event. The document should provide the 
names and contact information for specific individuals who 
should be contacted for each of these concerns. Many IRBs re- 
quire that a consent form include a contact person not directly 
affiliated with the research project, for questions or concerns 
related to research rights and potential harm or injury. 

8. A clear statement explaining that participation is completely 
voluntary and that refusal to participate will involve no penalty 
or loss of benefits to which the participant is otherwise enti- 
tled. 

9. A description of circumstances under which the study may be 
terminated (e.g., loss of funding). 

10. A statement that any new findings discovered during the 

course of the research that may relate to the participant's will- 
ingness to continue participation will be provided to the partic- 
ipant. 

Under federal regulations contained in 45 CFR § 46.1 16(d), an IRB may 
approve a waiver or alteration of informed consent requirements whenever 
it finds and documents all of the following: 

1 . The research involves no more than minimal risk to participants. 

2. The waiver or alteration will not adversely affect the rights and welfare 
of participants. 

3. The research could not practicably be carried out without the waiver 
or alteration. 

4. Where appropriate, the participants will be provided with addi- 
tional pertinent information after participation. 

The IRB may also approve a waiver of the requirement for written doc- 
umentation of informed consent under limited circumstances described at 
45 CFR § 46.117(c). 
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INSTITUTIONAL REVIEW BOARDS 

All research with human participants in the United States is regulated by 
institutional review boards (IRBs). As mentioned earlier, before any re- 
search study can be conducted, the researcher must have the procedures 
approved by an IRB. 

IRBs are formed by academic, research, and other institutions to pro- 
tect the rights of research participants who are participating in studies be- 
ing conducted under the jurisdiction of the IRBs. IRBs have the authority 
to approve, require modifications of, or disapprove all research activities 
that fall within their jurisdiction as specified by both the federal regula- 
tions and local institutional policy Researchers are responsible for com- 
plying with all IRB decisions, conditions, and requirements. 

Researchers planning to conduct research studies must begin by 
preparing written research protocols that provide complete descriptions 
of the proposed research (see Rapid Reference 8.6). The protocol should 
include detailed plans for the protection of the rights and welfare of 
prospective research participants and make certain that all relevant laws 
and regulations are observed. Once the written protocol is completed, it is 
sent to the appropriate IRB along with a copy of the consent form and any 
additional materials (e.g., test materials, questionnaires) . The IRB will then 
review the protocol and related materials. 

According to 45 CFR § 46.1 07, IRBs must have at least five members, in- 
cluding the IRB chairperson, although most have far more. IRBs should be 
made up of individuals of varying disciplines and backgrounds. This het- 
erogeneity is necessary to ensure that research protocols are reviewed from 
many different perspectives. This includes having researchers, laypeople, 
individuals from different disciplines, and so on. For example, an IRB may 
include scientists and/or methodologists who are familiar with research 
and statistical issues; social workers who are familiar with social, familial, 
and support issues; physicians and psychologists who are familiar with 
physical and emotional concerns; lawyers who can address legal issues; and 
clergy who can address spiritual and community issues. And when proto- 
cols involve vulnerable populations, such as children, prisoners, pregnant 
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IRB Review: Protocol Submission Overview 

1 . Introduction and rationale for study. 

2. Specific aim(s). 

3. Outcomes to be measured. 

4. Number of participants to be enrolled per year and in total. 

5. Considerations of statistical power in relation to enrollment. 

6. Study procedures. 

7. Identification of the sources of research material obtained from indi- 
vidually identifiable living human participants in the form of speci- 
mens, records, or data. 

8. Sample characteristics (i.e., anticipated number, ages, gender, ethnic 
background, and health status). Inclusion and exclusion criteria. Ratio- 
nale for use of vulnerable populations (i.e., prisoners, pregnant women, 
disabled persons, drug users, children) as research participants. 

9. Recruitment procedures, nature of information to be provided to 
prospective participants, and the methods of documenting consent. 

1 0. Potential risks and benefits of participation. (Are the risks to partici- 
pants reasonable in relation to the anticipated benefits to participants 
and in relation to the importance of the knowledge that may reason- 
ably be expected to result from the research?) 

I I . Procedures for protecting against or minimizing potential risks. Plans 
for data safety monitoring and addressing adverse events if they oc- 
cur Alternative interventions and procedures that might be advanta- 
geous to the participants. 

1 2. Inclusion of or rationale for excluding children (rationale to be based 
on specific regulations outlined in 45 CFR § 46). 

women, or handicapped or mentally disabled persons, the IRB must con- 
sider the inclusion of one or more individuals who are knowledgeable 
about and experienced in working with these potential participants. 

In addition to their diversity and professional competence, IRBs must 
have a clear understanding of federal and institutional regulations so that 
they can determine whether the proposed research is in line with institu- 
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tional regulations, applicable law, and standards of professional conduct 
and practice. Importantly, IRBs are required to have at least one member 
who has no affiliation with the institution (even through an immediate 
family member). Finally, the IRB must make every effort to ensure that it 
does not consist entirely of men or entirely of women, although selections 
cannot be made on the basis of gender. 

One of the initial questions an IRB must ask when reviewing a research 
protocol is whether that IRB has jurisdiction over the research. That is, the 
IRB must ask, "Is the research subject to IRB review?" To answer this 
question, the IRB must determine (1) whether the activity involves research 
and (2) whether it involves human participants. Research is defined by the fed- 
eral regulations as "a systematic investigation, including research develop- 
ment, testing and evaluation, designed to develop or contribute to gener- 
alizable knowledge" (45 CFR § 46.102[d]). Human participants are defined 
by the regulations as "living individual(s) about whom an investigator 
(whether professional or student) conducting research obtains (1) data 
through intervention or interaction with the individual, or (2) identifiable 
private information" (45 CFR § 46.107[fJ). 

Some types of research involving human participants may be exempt 
from IRB review (45 CFR § 46.101 [b]). These include certain types of ed- 
ucational testing and surveys for which no identifying information is col- 
lected or recorded. In such instances, the participants would not be at risk 
of any breach of confidentiality. 

If the study is not deemed to be exempt from IRB review, the IRB must 
determine whether the protocol needs to undergo expedited review or full 
review. To meet the requirements for expedited review, a study must involve 
no more than minimal risk, or otherwise fall into one of several specific cat- 
egories, such as survey research or research on nonsensitive topics. Minimal 
risk is defined by federal regulations as the fact that the "probability and 
magnitude of harm or discomfort anticipated in the research are no greater 
in and of themselves from those ordinarily encountered in daily life or dur- 
ing the performance of routine physical or psychological examination or 
tests" (45 CFR § 46.1 10[b]). Expedited review can also be obtained for mi- 
nor changes in previously approved research protocols during the period 
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(of one year or less) for which the original protocol was authorized. Expe- 
dited reviews can be handled by a single IRB member (often the chair) and 
therefore are much more expeditious (as the name suggests). 

Protocols that do not meet the criteria for expedited review must re- 
ceive a full review by all members of the IRB. Underfill review, all members 
of the IRB receive and review the protocol, consent, and any additional 
materials prior to their scheduled meeting. Depending on the particular 
IRB and the number of protocols that they normally review, an IRB may 
meet anywhere from biweekly to quarterly. Following a thorough review 
and discussion of issues and concerns within the committee, many IRBs 
invite the researchers in to answer specific questions from the IRB mem- 
bers. Questions may address any or all aspects of the research procedures. 
After all of the IRB's questions have been answered and the researchers 
leave the room, the committee votes to either grant approval or not. In 
most cases, the committee will vote to withhold approval pending certain 
modifications or changes to the protocol or the consent procedures. Once 
the modifications are made, the protocol must be resubmitted. If the IRB 
is satisfied that the necessary modifications were made, they will typically 
grant approval and provide the researcher with a copy of the study con- 
sent form bearing the IRB's stamped, dated approval. Only copies of this 
stamped consent form may be used to obtain informed consent from 
study participants. Although IRB approval can be granted for one full 
year, certain studies (often those involving a less clear risk/benefit ratio) 
may receive approval for 6 months or less. In any case, researchers must 
make certain to keep approvals and consent forms current. If the study is 
approved, the researcher is then responsible for reporting the progress of 
the research to the IRB and/or appropriate institutional officials as often 
as (and in the manner) prescribed by the IRB, but no less than once per 
year(45CFR§46.109[e]). 

DATA SAFETY MONITORING 

Concerns about respect, beneficence, and justice are not entirely put to 
rest by institutional review and informed consent. Although these pro- 
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cesses ensure the appropriateness of the research protocol and allow po- 
tential participants to make autonomous informed decisions, they do not 
provide for ongoing oversight that may be necessary to maintain the safety 
and ethical protections of participants as they proceed through the re- 
search experience. To accomplish this may require the development of a 
data safety monitoring plan (DSMP). 

DSMPs set specific guidelines for the regular monitoring of study pro- 
cedures, data integrity, and adverse events or reactions to certain study 
procedures. According to federal regulations (45 CFR § 46.1 11 [a] [6]), 
"[W]hen appropriate, the research plan makes adequate provision for 
monitoring the data collected to ensure the safety of subjects." The NIH, 
along with other public and private agencies, have developed specific cri- 
teria for their DSMPs. For example, for Phase I and Phase II NIH clinical 
trials (NIH, 1998), researchers are required to provide a DSMP as part of 
their grant applications. DSMPs are then reviewed by the scientific review 
groups, who provide the researchers with feedback. Subsequently, re- 
searchers are required to submit more detailed monitoring plans as part of 
their protocols when they apply for IRB approval. 

In addition to the DSMP, researchers may be required by their funding 
agencies or IRBs to establish a data safety monitoring board (DSMB). The 
DSMB serves as an external oversight committee charged with protecting 
the safety of participants and ensuring the integrity of the study. The 
DSMBs, which must be very familiar with the research protocols, are 
responsible for periodically reviewing outcome data to determine whether 
participants in one condition or another are facing undue harm as a result 
of certain experimental interventions. The DSMBs may also monitor 
study procedures such as enrollment, completion of forms, record keep- 
ing, data integrity, and the researchers' adherence to the study protocol. 
Based on these data, the DSMB can make specific recommendations re- 
garding appropriate modifications. In trials that are conducted across sev- 
eral programs or agencies (i.e., multicenter trials), DSMBs may act as over- 
arching IRBs that are responsible for the ethical oversight of the entire 
project. 
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ADVERSE AND SERIOUS ADVERSE EVENTS 

Researchers are required to report (to the governing IRBs) any untoward 
or adverse events involving research participants during the course of 
their research involvement. Although the specific reporting requirements 
differ by IRB and funding source, the definitions of adverse events (origi- 
nating in the FDA's definitions of adverse events in medical trials) are gen- 
erally the same. 

An adverse event (AE) is defined as any untoward medical problem that 
occurs during a treatment or intervention, whether it is deemed to be re- 
lated to the intervention or not. A serious adverse event (SAE) is defined as 
any occurrence that results in death; is life-threatening; requires inpatient 
hospitalization or prolongation of existing hospitalization; or creates per- 
sistent or significant disability/incapacity, or a congenital anomaly/birth 
defects. 



SUMMARY 

This chapter was intended to provide a general history and overview of 
some of the central ethical issues relating to the conduct of scientific re- 
search. Unfortunately, comprehensive coverage of many specific research 
ethics (e.g., publication credit, reporting research results, plagiarism) was 
beyond the scope of this chapter. Therefore, we strongly recommend that 
readers refer to specific ethical codes and federal, local, and institutional 
regulations when planning and engaging in research. 

The many revelations of human rights violations and atrocities in the 
name of scientific research have led to a heightened public awareness 
about the need for regulations to protect the rights of human research 
participants. In response to this heightened awareness and call for protec- 
tions, the federal government has established an extensive system of reg- 
ulations and guiding principles to promote respect for persons, benefi- 
cence, and justice in research with human participants. These regulations 
have helped to delineate the specific types of information that must be 
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conveyed to potential research participants in an effort to ensure that con- 
sent to research is voluntary, knowing, and intelligent. In addition, these 
regulations have generated mandatory ethical oversight of research stud- 
ies. Despite these many developments, there is still a need for further re- 
search in the area of ethical protections in research studies. If anything has 
been learned in the years since Nuremberg and Tuskegee, it is that we must 
continue to be vigilant in protecting the rights and interests of our human 
research participants. 



W05 TEST YOURSELF 3Su 

1 . The three principles set forth by the Belmont Report are ( I ) respect for 
persons, (2) beneficence, and (3) . 

2. Beneficence has its origins in the famous edict of the Hippocratic oath, 
which states, "First, do no . 

3. In most cases, before an individual can participate in any research study, 
he or she must provide . 

4. Before any study can take place, it must first be approved by an 

5. The three basic elements of informed consent are that it must be ( I ) com- 
petent, (2) knowing, and (3) . 

Answers: I . justice; 2. harm; 3. informed consent; 4. institutional review board (or human sub- 
jects committee); 5. voluntary 



term LinG - live, informative, Non-cost and Genuine i 



Nine 

DISSEMINATING RESEARCH RESULTS 
AND DISTILLING PRINCIPLES OF 
RESEARCH DESIGN AND METHODOLOGY 



At this point in the book, you should have a fairly good conceptu- 
alization of the major considerations that are involved in con- 
ducting a research study In the preceding chapters, we have cov- 
ered each step in the process of conducting research, from the earliest 
stages — choosing a research idea, articulating hypotheses, and selecting an 
appropriate research design — to the final stages — analyzing the data and 
drawing valid conclusions. Along the way, we have also discussed several 
important research-related considerations, including several types of va- 
lidity, methods of controlling artifact and bias, and the ethical issues in- 
volved in conducting research. Although you may not feel like an expert in 
research yet, you should take comfort in knowing that the concepts and 
strategies that you learned from this book will provide you with a solid 
foundation of research-related knowledge. As you gain additional re- 
search experience, these concepts and strategies will become second na- 
ture. We have certainly covered a good deal of information in this book, 
but we are not quite finished yet. 

In this concluding chapter, we will discuss what is often considered the 
final step of conducting a research study: disseminating the results of the 
research. As will be discussed, there are numerous options available for 
those researchers who desire to share the results of their studies with oth- 
ers. From books to journals to the Internet, today's society offers many ef- 
fective and efficient outlets for the dissemination of research study results. 
After discussing the dissemination of research results, the final part of this 
chapter will present a distillation of the major principles of research design 
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and methodology. Finally, to assist the reader in designing a sound re- 
search design, this chapter will include a checklist of the major research- 
related concepts and considerations we have covered in this book. 

DISSEMINATING THE RESULTS OF RESEARCH STUDIES 

This book would certainly be incomplete if we did not discuss the dissem- 
ination of research results. This is an important topic that is occasionally 
overlooked in research design and methodology textbooks. As we will see, 
the dissemination of research study results plays a vital role in the ad- 
vancement of science and, consequently, in the way we all live. 

If you recall, at the beginning of this book, we discussed the role that re- 
search plays in science. Specifically, we stated that research is the primary 
vehicle by which science advances. Among other things, research has the 
capacity to answer questions, solve problems, and describe things, all of 
which may lead to an improvement in the way we live. But here is the 
essential point to remember: For a research study to change the way we 
live, or to have any effect at all, the researcher must share the results 
of the research with other people in the scientific community. Then, in 
turn, the information gleaned from the research study — regardless of 
whether it relates to technology, medicine, economics, or any other field 
of study — must ultimately be shared with the general public in one form 
or another. 

We would all likely agree that it would certainly do little good if a re- 
searcher who discovered something important decided to keep those re- 
sults quiet. Can you imagine how different the world would be if Thomas 
Edison had invented the light bulb, but then decided not to tell anyone 
about his invention? What if Albert Einstein had decided not to share his 
special and general theories of relativity? What if Bill Gates had decided to 
keep his computer technology all to himself? What if Jonas Salk decided 
that his cure for polio should not be shared with other people? Clearly, 
then, sharing the results of research studies is important, but let's take a 
closer look at why it is so important. After discussing the importance of 
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sharing the results of research studies, we will briefly discuss the various 
oudets that are available to researchers who decide to share their results. 



Sharing the Results of Research Studies 

There are several benefits to sharing the results of research studies. First, 
it adds to the knowledge base in a particular scientific field. As you know, 
science is essentially an accumulation of knowledge, and sharing research 
results adds an incremental amount of knowledge to what is already 
known about a particular topic. Thus, the dissemination of research results 
helps to advance the progress of science. 

Second, sharing the results of research ultimately improves the overall 
quality of the research being conducted. For example, when a researcher 
seeks to publish the results of his or her research in a professional journal, 
the manuscript describing the research is typically reviewed by several ed- 
itors who have special expertise in the topic area of the research. As we will 
discuss in the next section, the editors evaluate the quality of the study and 
the manuscript, and then they make a recommendation regarding whether 
the manuscript should be published in the journal. This is referred to as 
the peer-review process. Presumably, 
only the most well-conducted 
studies and well-written manu- DOE ' T FORGET 



scripts will make it through this 

peer-review process to publica- Benefits of Sharing 

tion. As a result, the publication Research Results 

process tends to weed out poorly , Adds t0 the know | edge base in 

conducted studies, which has the a particular scientific field. 

effect of improving the quality of 2. Improves the overall quality of 

the research being conducted. In research being conducted. 

summary, if researchers have an 3 - Allows other researchers to 

i „ i i- i ■ replicate a study's results or ex- 
eye toward eventually publishing r , , ' „ 



the results of their studies, those 
researchers will need to ensure 



tend the study's findings. 
4. Improves the way we live. 
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that their studies are well designed and well conducted. We will talk more 
about the publication process in the next section. 

Third, sharing the results of research allows other researchers to evalu- 
ate the study's results in the context of other research studies. For example, 
other researchers may attempt to replicate the original study's findings, 
which we already established is an important component of scientific re- 
search; or may even extend the original study's findings in perhaps unan- 
ticipated ways. In either case, the original study's results are being evalu- 
ated by other researchers in other contexts. This tends to function as a 
quality check on the original research. 

Finally, for the results of a research study to have an effect on the way 
we all live, those results need to be shared with others. This is the point we 
addressed earlier in this section. To refresh your memory, we established 
that a ground-breaking research study would do little good if the re- 
searcher decided not to share the study's results with others. In fact, some 
would argue that the true test of a research study's value lies in its ability to 
improve some facet of the way we live. For that improvement to take place, 
a study's results need to be shared with other people. For example, when 
Bill Gates developed his revolutionary computer technology, that tech- 
nology had to be shared with others (e.g., scientists, manufacturers, dis- 
tributors, marketing firms), and then that technology had to be translated 
into something that would benefit the public at large — that is, personal 
computers for individual sale. 

Now that we have addressed the importance of sharing the results of re- 
search studies, let's take a closer look at the various options that are avail- 
able for researchers who desire to disseminate their research findings. 

Presentation of Research Results 

One option available to those researchers who decide to share the results 
of their research is to present their findings at professional conferences. 
Most scientific fields have guiding professional organizations that sponsor 
regularly held professional conferences. One of the primary functions of 
these conferences is to serve as outlets for the presentation of research re- 
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suits that are relevant to that particular scientific field. Because profes- 
sional conferences are held so frequendy, they provide for the dissemina- 
tion of up-to-date research findings. By contrast, the lag time between 
completing a research study and the eventual publication of those results 
in a professional journal is typically much longer. As we will discuss in the 
next section, it can often take well over a year for a submitted manuscript 
to be published in a professional journal. By that time, the study's results 
may have been expanded upon, refuted, or made obsolete by other stud- 
ies. For these reasons, professional conferences are a valuable and efficient 
outlet for research results. 

Researchers have several options available to them in terms of present- 
ing their results at professional conferences. Although the format for pre- 
sentations differs from conference to conference, most conferences offer 
some combination of the following presentation formats: poster presen- 
tations, oral presentations, and symposiums. A poster presentation, as the 
name indicates, involves presenting the results of a research study in a 
poster format. At many conferences, this is a preferred presentation for- 
mat for students and beginning researchers (probably because there are 
many available presentation slots, which makes it less competitive than 
other presentation formats). An oral presentation involves speaking about 
the research results for a specified amount of time (sometimes as short as 
10 minutes). Finally, a symposiumis a collection of related oral presentations 
that are presented as a group. 

Getting to present the results of a research study at a professional con- 
ference is a competitive process. Typically, researchers submit short sum- 
maries of their research studies to the conference organizers who, in turn, 
ask reviewers to evaluate the research and determine whether the study is 
worthy of being presented at the conference. If accepted, it must be de- 
termined whether the research study will be presented as a poster or an 
oral presentation. At most conferences, it is generally considered more 
prestigious to have your study accepted as an oral presentation. Often, 
short summaries of the research — abstracts — are then published in a jour- 
nal so that people who did not attend the conference can become familiar 
with the results of the studies. 
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Publication of Research Results 

Publication of research results is, by far, the most common method of dis- 
seminating the results of a research study. There are several publication 
options, including books, book chapters, monographs, newsletters, work- 
ing reports, technical reports, and Internet-based articles. However, pub- 
lication in a peer-reviewed professional journal is generally considered the 
primary and most valued outlet for the dissemination of research results 
(see Kazdin, 1992, 2003b). Let's take a closer look at publishing a research 
study's results in a peer-reviewed journal. 

Earlier in this chapter, we briefly discussed the peer-review process, 
which is the procedure used by most professional journals to determine 
which articles should be published. In this section, we will add a few com- 
ments to our previous discussion. Once a researcher completes a study, 
there are several decisions that need to be made (see Kazdin, 1992). The 
first is whether the study's findings merit publication. In other words, the 
researcher must determine, among other things, whether the study makes 
a valuable contribution to the field. If the researcher decides to seek pub- 
lication of the study's findings, he or she must then determine what aspects 
of the study should be published. In large studies, it may not be practical 
to publish the entire study in one manuscript, so it may need to be sub- 
divided in some rational manner. For example, if a research study has two 
distinct parts, the researcher may decide to publish each part of the study 
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Publishing a Study's Results Begins in the Planning Stage 

It is important to note that decisions made in the planning and design 
stages of a research study have a direct effect on whetherthat study will 
eventually be accepted for publication. Many of the decisions made in the 
early stages of a study, such as what topic to study, what sample to use, 
and which research design to implement, play an important role in deter- 
mining the overall quality and impact of the study, which are two impor- 
tant considerations in whether it will later be published. 
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Least Publishable Units 

Researchers must be careful to avoid breaking up a study into something 
referred to as least publishable units. Although it is certainly desirable to 
publish the results of a research study, most researchers agree that it is 
not advisable to pad your curriculum vitae with more publications by 
breaking up a study into the largest number of smallest publishable parts. 
A study should be divided into separate manuscripts only if the division is 
logically supported by the design of the study. 

in a separate manuscript (but see Rapid Reference 9.1 for a word of cau- 
tion about doing this). 

Having decided to publish the study, the researcher must then decide to 
which journal he or she will submit a manuscript describing the study. 
There may literally be hundreds of journals in a given scientific field, and 
the researcher must carefully determine which journal would be the most 
appropriate outlet for his or her research. It is important to note that, in 
some fields of study, researchers can submit a manuscript to only one jour- 
nal at a time. In these situations, the researcher must await a final publica- 
tion decision from the journal before submitting the manuscript to an- 
other journal (if necessary). Given that it can take several months, or 
perhaps even longer, for a manuscript to be reviewed and for a publication 
decision to be made, researchers must decide carefully where they will 
send their manuscripts. If time is of the essence, as it often is with research, 
choosing an appropriate journal is an extremely important decision. 

Once a researcher decides on a particular journal, he or she must pre- 
pare the manuscript in accordance with the style and formatting require- 
ments of the journal. Different journals — and even different fields of 
study — have different formatting and style requirements, and it is very 
important that researchers strictly adhere to those specifications. For 
example, in psychology (and related disciplines), the style and format of 
manuscripts is specified by the APA (2001). The final manuscript con- 
sists of several different sections (see Rapid Reference 9.2) that describe 
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Typical Sections of a Manuscript 

For manuscripts that describe empirical studies, the following sections are 
typically included: 

1. Title 

2. Abstract (brief summary of the study) 

3. Introduction (rationale and objectives for the study; hypotheses) 

4. Method (description of research design, study sample, and research 
procedures) 

5. Results (presentation of data, statistical analyses, and tests of hypothe- 
ses) 

6. Discussion (major findings, interpretations of data, conclusions, limita- 
tions of study, and areas for future research) 



all aspects of the research study, including the rationale for the study, re- 
lated research, study procedures, statistical analyses, results, and implica- 
tions. 

After the manuscript is submitted to a journal, the editor of the journal 
sends the manuscript to several reviewers who are asked to review the 
manuscript and make a publication recommendation. There are generally 
two categories of reviewers for journals: (1) consulting editors (who re- 
view manuscripts for the journal on a regular basis) and (2) ad hoc editors 
(who review manuscripts for the journal less frequently, typically on an as- 
needed basis). The reviewers are usually selected because of their knowl- 
edge and expertise in the area of the study (Kazdin, 1992). 

The reviewers evaluate each research study in terms of its substance, 
methodology, contribution to the field, and other considerations relating 
to the overall quality of the research study and the accompanying manu- 
script. It is also worth noting that, depending on the particular field of 
study, the editorial reviews may be either anonymous or signed. After all 
of the reviewers have completed their reviews and submitted their written 
comments to the journal editor, the journal editor makes a final publica- 
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tion decision based on his or her evaluation of the manuscript and the re- 
viewers' written editorial comments. 

Although journals differ with respect to how they handle manuscript 
submissions, most journals use some combination of the following publi- 
cation decisions: 

1 . Accepted: The manuscript is accepted contingent on the author's 
making revisions specified by the journal reviewers. Almost no 
manuscript is accepted for publication as submitted (i.e., with 
no revisions), and some accepted manuscripts may require sev- 
eral rounds of revisions before finally being published. 

2. Rejected: The manuscript is rejected, and the author will not be 
invited to revise and resubmit the manuscript for further publi- 
cation consideration. Manuscripts can be rejected for many dif- 
ferent reasons, including design flaws, an unimportant topic, 
and a poorly written manuscript. 

3. Rejected-resubmit: The manuscript is rejected, but the author is in- 
vited to revise and resubmit the manuscript for future publica- 
tion consideration. In this instance, the required revisions are 
typically extensive, and there is no guarantee that the manuscript 
will be published, even if all of the specified revisions are made. 

Most researchers would likely agree that going through the peer-review 
publication process can be both time consuming and humbling. Two as- 
pects of this process can be particularly difficult to handle for inexperi- 
enced and experienced researchers alike: First, the peer-review process is 
often excruciatingly slow. As previously noted, once a manuscript is sub- 
mitted to a journal, it can take several months for a publication decision to 
be made. If extensive revisions are required as a condition of publication, 
then it can take significantly longer than that. Even after a journal decides 
to publish the manuscript, it can take many more months — sometimes 
well over a year — for the article to finally be published. The slow pace of 
the peer- review publication process is often a source of frustration for re- 
searchers. Moreover, it is possible for research results to become stale, or 
obsolete, by the time that the results are finally published. 

term LinG - live, informative, Non-cost and Genuine \ 



270 ESSENTIALS OF RESEARCH DESIGN AND METHODOLOGY 



Second, it is not easy to have your research evaluated, criticized, and 
(more often than not) rejected by journals. After putting a great deal of 
thought, energy, time, and money into a research study, it can be difficult 
to handle criticism and rejection. Yet rejection — and lots of it — is part of 
the business of conducting research. Some of the more prestigious pro- 
fessional journals have rejection rates of over 90%, which means that they 
are accepting for publication approximately 1 manuscript out of every 10 
that are submitted. Even seasoned and well-published researchers experi- 
ence their fair share of rejection. (At this point, it may seem that we should 
comfort the reader by indicating that the rejection aspect of publishing be- 
comes easier over time, but we're not exactly sure that's true.) Despite the 
frustrations associated with the peer-review process — in fact, perhaps be- 
cause of the frustrations associated with the peer- review process — getting 
a research study published is a very exciting and rewarding accomplish- 
ment. 

PRINCIPLES OF RESEARCH DESIGN AND METHODOLOGY 

To assist you in digesting the large amount of material presented in this 
book, we have distilled some overarching principles of research method- 
ology that should be kept in mind when engaging in research. The follow- 
ing principles should serve as helpful guides as you engage in the process 
of designing and conducting a research study. 

KeepYour Eyes Open 

Perhaps the most basic lesson to guide your research is to keep your eyes 
open. As we discussed in Chapter 2, many ideas for research studies are 
discovered simply by observation of the environment in which we live. It 
is often through the simple act of observation that researchers formulate 
their research ideas and choose their research questions. A keen eye to 
your surroundings may reveal questions that need to be answered, prob- 
lems that need to be solved, things that need to be improved, or phenom- 
ena that need to be described, all of which can be accomplished through 
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well-designed and well-conducted research. Therefore, keeping your eyes 
open is often the first step in the research process. 

Be An Empiricist 

The hallmark of being a good researcher is being an empiricist. As you may 
recall from Chapter 1, empiricists rely on the scientific method to acquire 
new knowledge. The scientific method's heavy emphasis on direct and 
systematic observation and hypothesis testing in the acquisition of new 
knowledge effectively distinguishes science from pseudo-science and 
nonscience. Moreover, to be able to draw valid conclusions based on your 
research, which is the goal of all research, it is essential that you adhere to 
the empirical approach. 

Be Creative 

Throughout this book, we have emphasized the importance of using an 
appropriate research design and sound methodology. As you know, en- 
gaging in well-designed research studies is the only way of ensuring that re- 
searchers can draw valid conclusions based on the results of their studies. 
Clearly, then, basing your research design and methodology on accepted 
scientific principles is an important consideration. 

It is also important, however, to be creative when conducting research. 
Creativity is particularly important in generating new research ideas, com- 
ing up with appropriate and perhaps novel research designs, and thinking 
about the implications of your research studies. Thinking outside the box 
has led to many great scientific discoveries. Good research is often as much 
art as it is science, so being creative is an important asset to the process. 

Research Begets Research 

This principle emphasizes the importance of following a logical progres- 
sion when conducting research. In other words, to have a coherent body of 
research, each research study should be the next logical step in the overall 
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line of research. As we have repeatedly noted throughout this book, science 
advances in small increments through well-conducted research studies. 
Therefore, it is important that research studies answer discrete questions 
that flow logically from prior research studies. Following this logical pro- 
gression of research ensures that research studies, and the findings gleaned 
from them, are based on a solid theoretical and empirical foundation. 

Adhere to Ethical Principles 

The importance of adhering to applicable ethical principles was discussed 
in detail in Chapter 8, but it cannot be overemphasized. The rights of study 
participants are of paramount importance in the research context, and 
protecting those rights takes precedence over all other research-related 
considerations. Violating applicable ethical guidelines may hurt the study 
participants, the reputation of the researchers who conducted the study, 
and, in some ways, the entire field of scientific research. Thus, researchers 
have an obligation to be aware of the ethical guidelines that govern the re- 
search that they are conducting. 

Have Fun 

This almost seems axiomatic, but we'll state it anyway. Try to have fun while 
conducting research. Conducting research can certainly be an arduous en- 
deavor, but it is important to have fun. As with anything else, if you are hav- 
ing fun while you do it, you will be more likely to become engaged in the 
process. Research can be exciting, so take pride in being part of something 
that will advance science and potentially improve the way we all live. 

CHECKLIST OF RESEARCH-RELATED CONCEPTS 
AND CONSIDERATIONS 

We have finally reached the concluding section of this book. In this sec- 
tion, we will present a convenient checklist of the major research-related 
concepts and considerations that we have covered. Although the follow- 
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ing checklist could not possibly contain every conceivable consideration 
that researchers must take into account, it should serve to alert researchers 
to the major considerations that must be kept in mind when designing and 
conducting a research study. 

1 . Follow the scientific method. The scientific method is what sepa- 
rates science from nonscience. The scientific method, with its 
emphasis on observable results, assists researchers in reaching 
valid and scientifically defensible conclusions. 

2. Keep the goals oj scientific research in mind. The goals of scientific 
research are to describe, predict, and understand or explain. 
Keeping these goals in mind will assist you in achieving the 
broad goals of science — that is, answering questions and ac- 
quiring new knowledge. 

3. Choose a research topic carefully. There are two considerations with 
respect to choosing a research topic. First, a research question 
must be answerable using available scientific methods. If a 
question cannot be answered, then it cannot be investigated 
using science. Second, it is important to make sure that the 
question you are asking has not already been definitively an- 
swered; this emphasizes the importance of conducting a thor- 
ough literature review. 

4. Use operational definitions. Operational definitions clarify exactly 
what is being studied in the context of a particular research 
study. Among other things, this reduces confusion and permits 
replication of the results. 

5. Articulate hypotheses that are falsifiable and predictive. As you may re- 
call, each hypothesis must be capable of being refuted based 
on the results of the study. Furthermore, a hypothesis must 
make a prediction, which is subsequently tested empirically by 
gathering and analyzing data. 

6. Choose variables based on the research question and hypotheses. The 
variables selected for a particular study should stem logically 
from the research question and the hypotheses. 
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7. Use random selection whenever possible. Use random selection when 
choosing a sample of research participants from the popula- 
tion of interest. This helps to ensure that the sample is repre- 
sentative of the population from which it was drawn. 

8. Use random assignment whenever possible. Use random assignment 
when assigning participants to groups within a study. Random 
assignment is a reliable procedure for producing equivalent 
groups because it evenly distributes characteristics of the 
sample among all of the groups within the study. This helps the 
researcher isolate the effects of the independent variable by en- 
suring that nuisance variables do not interfere with the inter- 
pretation of the study's results. 

9. Be aware oj multicultural considerations. Be cognizant of the effects 
that cultural differences may have on the research question and 
design. For certain types of research, such as treatment-based 
research, it is important to determine whether the intervention 
being studied has similar effects on both genders and on di- 
verse racial and ethnic groups. 

10. Eliminate sources oj artifact and bias. To the extent possible, elimi- 
nate sources of artifact and bias so that more confidence can 
be placed in the results of the study. The effects of most types 
of artifact and bias can be eliminated (or at least considerably 
reduced) by employing random selection when choosing re- 
search participants and random assignment when assigning 
those participants to groups within the study. 

1 1 . Choose reliable and valid measurement strategies. When selecting 
measurement strategies, let validity and reliability be your 
guides. Measurement strategies should measure what they pur- 
port to measure, and should do so in a consistent fashion. 

12. Use rigorous experimental designs. Whenever possible, researchers 
should use a true experimental design. Only a true experimental 
design, one involving random assignment to experimental and 
control groups, permits researchers to draw valid causal infer- 
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ences about the relationship between variables. Because it may 
not always be possible or feasible to use a true experimental de- 
sign, a good rule of thumb is that researchers should strive to 
use the most rigorous design possible in each situation. 

13. Attempt to increase the validity of a study. A well-conducted research 
study will have strong internal validity, external validity, con- 
struct validity, and statistical validity. This maximizes the likeli- 
hood of drawing valid inferences from the study. 

14. Use care in analysing and interpreting the data. A crucial aspect of 
research studies is preparing the data for analysis, analyzing the 
data, and interpreting the data. The proper analysis of a study's 
data enhances the ability of researchers to draw valid infer- 
ences from the study. 

1 5 . Become familiar with commonly encountered ethical considerations. 
Researchers have an obligation to avoid violating ethical stan- 
dards when conducting research. This means that researchers 
must be familiar with, among other things, the rights of study 
participants. 

16. Disseminate the results of research studies. Science advances through 
the dissemination of research findings, so researchers should 
attempt to share the results of their research with the scientific 
community. 

SUMMARY 

We have covered quite a bit of research-related information in this book, 
and we hope that you have learned a great deal about the process and im- 
portance of conducting well-designed research studies. We are confident 
that the material covered in this book will serve you well in your research 
endeavors, and we believe that this book will provide you with a solid 
foundation of research-related knowledge and skills. As you continue to 
develop as a researcher, we hope that the lessons learned from this book 
will remain in the forefront of your mind. 

term LinG - live, informative, Non-cost and Genuine \ 



276 ESSENTIALS OF RESEARCH DESIGN AND METHODOLOGY 



JSfr TEST YOURSELF ?6Su 



1 . The final step in a research study is the results of the study. 

2. The - process is used by journals to determine which 

manuscripts should be accepted for publication. 

3. Presentations and publications are two options available to researchers 
who desire to share the results of their studies. True or False? 

4. What are the three possible editorial decisions following the peer review 
of a manuscript? 

5. A is a collection of related oral presentations that are pre- 
sented as a group at a professional conference. 

Answers: I . disseminating (or sharing or publishing); 2. peer-review; 3.True; 4. Accepted, re- 
jected, rejected-resubmit; 5. symposium 
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246 
Invariants, 150-151 

Inverse correlation. See Negative correlation 
Inverse relationship, 216 
Inverse transformation, 207 

Justice, 238, 243-245 

Knowledge, controlling experimenter bias and, 
72, 74-75 

Last value carried forward, 205 

Length, test, 108 

Level, change in, 140—141 

Lexis, 32 

Likert scales, 152—153 

Linear regression, 224 

Literature review, 32—34 

Logarithm, 207 

Logging, 199-201 

Logistic regression, 224 

Log transformation, 207 

Longitudinal designs, 143 

MacArthur Competence Assessment Tool for 

Clinical Research, 247 
Magnitude, 98-99 
Manual, well-documented, 108 
Manuscript, typical sections of, 267, 268 
Matching, 88-90 

block randomization and, 126 
Maturation, threats to internal validity and, 

162-163,175 
Mean, 92, 212. See also Group means; Predicted 

mean imputation 



Measurement, 95—97 
error, 103-104 

strategies for minimizing, 104 
importance of, 96 
obtrusive vs. unobtrusive, 186 
psychometric considerations, 101—111 
scales of, 97-101 

strategies, commercially available instru- 
ments and, 108 
unreliability of, threats to statistical validity 

and, 195, 196 
See also Associations, measures of 
Median, 213 
Medical research, 237 
Medline, 32 
Mental Measurements Yearbook and Tests in 

Print, 108, 113 
Methodology: 
defined, 22 

principles of, 270—272 
Metric data, measurement and, 97 
Minimal risk, 256 

Moderator, focus groups and trained, 155 
Mortality, randomized two-group design and, 

130 
Motivation, choosing a research topic and, 29 
Multicultural issues, 60—63 
competence and, 60—61 
Multiple analysis of variance (MANOVA), 223 
Multiple regression, 224 

Multiple statistical comparisons. See Compar- 
isons, multiple 
Multiple time-series design. See Time-series 
design, multiple 

National Bioethics Advisory Commission 
(NBAC), 239-240 

National Commission for the Protection of 
Human Subjects of Biomedical and Be- 
havioral Research, 237 

National Institutes of Health (NIH), 62 

Guidelines on the Inclusion of Women and Minori- 
ties as Subjects in Clinical Research, 62—63 
Revitalization Act of 1 993, 62 

National Research Act, 237 

National Science and Technology Council, 240 

Naturalistic observation studies. See Observa- 
tions, naturalistic 
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Negative correlation, 19 
Negative relationship, 216 
Nominal scales, 97—98 

distinguishing characteristics of, 97 

measurement strategies for data collection 
and, 112 
Nomothetic approach, 1 7 
Nondirectional hypotheses, 39—41 
Nonequivalent comparison-group designs, 138 

posttest only, 138-139 

pretest-posttest, 139 
Nonexperimental designs. See Qualitative de- 
signs 
Noninterference, naturalistic observation and, 

150 
Nonmetric data, measurement and, 97 
Nonsignificance, 231—232 
Normal distribution. See Distribution, normal 
Norms, test evaluation and, 108 
Novelty effects, 183-185, 189 
Nuisance variables. See Variables, nuisance 
Null hypotheses, 9-10 

articulating hypotheses and, 38—39 

rejecting 

analyses and, 11, 12 
conclusions and, 14 

statistical validity and, 193 

See also Hypotheses 
Nuremburg Code, 235—237 

Observations, 5, 6-7, 117, 119 

naturalistic, 149-151 
Obtrusive measurement. See Measurement, ob- 
trusive vs. unobtrusive 
Office for Protection From Research Risks, 63 
Operational definitions, 7 

formulating research questions and, 35—37 

measurement and, 96 

psychometric considerations, 101—102 
Oral presentation, 265 
Ordinal scales, 98-99 

distinguishing characteristics of, 98 

measurement strategies for data collection 
and, 112 
Outliers, 167 

Parametric tests, 227 
Parsimony, 115 



Partial-blind technique, controlling experi- 
menter bias, 75, 76 
Partial correlation, 92, 93 
Participant effects, 76—79 

controlling, 79—81 
Participants, 50-51, 256 

assigning groups, 55—60 

multiculturalism and, 62—63 

selecting study, 51—55 
Pearson r, 218 
Peer-review process, 263 
Personal characteristics, informed consent and, 

246 
Phi correlation, 219 

Plausible hypotheses, 67. See also Hypotheses 
Point-biserial correlation, 218—219 
Population, 18 

of interest, 82-83 
Positive correlation, 19 
Positive relationship, 216 
Poster presentation, 265 
Practice effect, 166—167 
Pre/post design, 45 
Predicted mean imputation, 205 
Predictions, 8, 19,20 

articulating hypotheses and, 37—38 

theories and, 31 
Predictive validity, 109-1 10 
Present, 42-43 
Previous research, 30 
Problem, formulating the research, 34—37 
Problem solving, 29—30 
PsychlNFO, 32, 33 
PsychLIT, 32 
Publication bias, 231 
Publishing results, 266—270 

least publishable units and, 267 
P- value, 218 
Pygmalion effect, 69 

Qualitative data, measurement and, 97 
Qualitative designs, 147—156 
Qualitative research, 1 7 

Qualitative variables. See Variables, qualitative 
Quality, research idea and, 31-32 
Quality control procedures, controlling experi- 
menter bias and, 72, 73 
Quantitative data, measurement and, 97 
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Quantitative research, 17 

Quantitative variables. See Variables, quantitative 

Quasi-experimental designs, 85, 137—147 

Questionnaire: 

closed-ended, 152, 153 

open-ended, 152, 153 

survey studies and, 152 
Questions, 5, 7—8 

measurement strategies for data collection 
and, 111-112 

Random assignment, 56—57 

artifact and bias, controlling, 68, 82, 85—88 

See also Randomization 
Randomization: 

achieving control through, 81—93 

artifact and bias, controlling, 68 

block, 125-126 

causality and, 20 

checks, 129 

logistical difficulty of, 137 

See also Experimental designs 
Random numbers table, 124—125 
Random selection, 54—55, 56 

artifact and bias, controlling, 68, 82-85, 86, 
88 

See also Randomization 
Range, 214 
Ratio scales, 100-101 

distinguishing characteristics of, 100 
Reactivity: 

assessment and, 185—186, 189 

experimental arrangements and, 180—181, 
189 
Reading level, test evaluation and, 108 
Record-keeping responsibilities, 200 
Recruitment log, 199-200 
Regression, 224—225 
Relational vulnerabilities, informed consent 

and, 246-247 
Reliability: 

experiments and, 10 

increasing, strategies for, 104 

instrumentation and threats to internal valid- 
ity, 163-164 

measurement and, 102—106 

strategies for data collection and, 112 

test evaluation and, 108 

See also specific type 



Replication, 5, 14—16 

operational definitions and, 36 
previous research and, 30 
Research, definition of, 46 
Researchers, multiculturalism and, 60—62 
Respect for persons, 238, 240-241 
Response set, 206 
Results: 

misperceptions and, 2 
presentation of, 264—265 
publication of, 266-270 
reporting, 261, 262— 270 
popular media and, 2 
sharing the, 263—264 
survey studies and, 153 
Reversal time-series design. See Time-series 

design, reversal 
Roles: 

multiple, controlling experimenter bias, 72, 

73 
participant effects and, 78—79 
Rosenthal effect, 69 

Sample, 18, 54 

characteristics, threats to external validity 
and, 178-180, 189 

extraneous variables, controlling, 91—92 

survey studies and, 152 
Sample of convenience, 83—84 
Scientific method, 4—16 
Screening. .feData, screening 
Selection biases, threats to internal validity and, 

169-170,175 
Sensitization, pretest and posttest, 1 86—1 87, 

189 
Serious adverse event (SAE), 259 
Settings, threats to external validity and, 180, 

189 
Significant difference, 1 1 
Simple interrupted time-series design. See 

Time-series design, simple interrupted 
Simple regression, 224 

Situational factors, informed consent and, 246 
Slope, change in, 140—141 
Solomon four-group design, 132—133 

interaction effects and, 134 
Spearman rank-order correlation, 219 
Split-half reliability, 105 
Square root transformation, 207 
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Standard deviation, 92, 215—216 
Standardization: 

experimenter bias and, controlling, 72, 73 

instrumentation and threats to internal valid- 
ity, 163-164 
Standardized administration procedure, test 

evaluation and, 108 
Statistical approaches, controlling extraneous 

variables, 92-93 
Statistical conclusion validity, 85 
Statistical consultants, controlling experimenter 

bias and, 72, 74 
Statistical controls, 68 

Statistical evaluation, statistical validity and, 193 
Statistically significant difference, 45 
Statistically significant effect, 1 4 
Statistical power, 1 37 

data interpretation and, 225 

low, 194, 196 
Statistical regression, threats to internal validity 

and, 167-168, 175 
Statistical significance, 218, 229 
Statistical validity, 66, 67, 85, 158, 192-194 

threats to, 194-196 
Stimulus characteristics, threats to external va- 
lidity and, 180, 189 
Survey studies, 151, 153-154 

nine steps for, 152—153 
Symposium, 265 
Syphilis. See Tuskegee, syphilis study at 

Tabulation, 153 
Temporal precedence, 144 
Temporal validity, 176 
Testing, 116, 117 

threats to internal validity and, 165—167, 175 
Test-retest reliability, 105, 106 
Theoretical soundness, test evaluation and, 108 
Theory, 30-32 

Therapeutic misconception, 249 
Time, measurement strategies and, 115 
Time-order relationship, 21 
Time-series design, 1 39— 1 40 

multiple, 143 

reversal, 142 

simple interrupted, 141—142 
Timing: 

of assessment and measurement, 1 87—1 88, 1 89 

test evaluation and, 108 



Topic, choosing a research, 28—32 

Tracking, 199-201 

Training: 

controlling experimenter bias and, 72—73 

measurement strategies for data collection 
and, 114-115 
Treatment: 

imitation of, 171-173, 175 

medical research vs. medical, 237 

special, 173-174, 175 

See also Interference, multiple-treatment 
Trimodal distribution. See Distribution, trimodal 
True experiments, 85 
True score, 103 
7-test, 220-221 

controlling extraneous variables, 92, 93 

omnibus, 220 
Tuskegee, syphilis study at, 235, 237 
Two-group design, 89 

randomized, 127—128 
posttest only, 128 
pretest-posttest, 128—132 
Type I errors, 11—14 

data transformation and, 207 

statistical power and, 226 
Type II errors, 11—14 

data transformation and, 207 

statistical power and, 226 

U.S. Department of Health and Human Ser- 
vices, 63, 233, 239 

U.S. Food and Drug Administration (FDA), 
239, 245 

Unobtrusive measurement. ..tee Measurement, 
obtrusive vs. unobtrusive 

Validity, 23, 158, 196-197 
artifact and bias, 66 
experimental designs and threats to, 136 — 

137 
instrumentation and threats to internal valid- 
ity, 163-164 
measurement and, 106—1 1 1 

strategies for data collection and, 112 
test evaluation and, 108 
See also specific type 
Values, identifying and coding missing, 204, 

205, 206 
Variability, 194-195, 196 
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Variables: 

categorical, 47-49 

choosing, 41—50 

computing totals and new, 204, 205, 206, 

207 
continuous, 47—49 
defined, 3, 42 
dependent, 42-47 

measurement strategies, 48, 1 1 1—112 
holding constant, 88-92 
independent, 42—47 

factorial design and multiple, 135 



measurement strategies, 1 1 1—1 1 2 
varying, 48 
nuisance, 57 

equivalence testing and, 59 
quantitative, 49-50 

See also Database, defining variables within a 
Variance, 214-215 
Volunteers, participant effects and, 78 

Waiver, 253 
Wesdaw, 32 
World Medical Association, 237 
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